diff --git a/examples/simulated_data/index.html b/examples/simulated_data/index.html index 8f5203e..3820780 100644 --- a/examples/simulated_data/index.html +++ b/examples/simulated_data/index.html @@ -642,30 +642,7 @@
from sklearn.model_selection import KFold, ShuffleSplit
-
-from cpm import CPMRegression
-from cpm.simulate_data import simulate_regression_data
-from cpm.edge_selection import PThreshold, UnivariateEdgeSelection
-
-
-X, y, covariates = simulate_regression_data(n_features=1225, n_informative_features=50,
- covariate_effect_size=0.2,
- feature_effect_size=100,
- noise_level=0.1)
-
-univariate_edge_selection = UnivariateEdgeSelection(edge_statistic=['pearson'],
- edge_selection=[PThreshold(threshold=[0.05],
- correction=[None])])
-cpm = CPMRegression(results_directory='./tmp/example_simulated_data2',
- cv=KFold(n_splits=5, shuffle=True, random_state=42),
- edge_selection=univariate_edge_selection,
- #cv_edge_selection=ShuffleSplit(n_splits=1, test_size=0.2, random_state=42),
- add_edge_filter=True,
- n_permutations=10)
-cpm.estimate(X=X, y=y, covariates=covariates)
-
-#cpm._calculate_permutation_results('./tmp/example_simulated_data2')
+Python
diff --git a/search/search_index.json b/search/search_index.json
index 7699575..9ec8a47 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":""},{"location":"#confound-corrected-connectome-based-predictive-modeling-cccpm","title":"Confound-Corrected Connectome-Based Predictive Modeling (CCCPM)","text":"CCCPM is a newly developed Python toolbox designed specifically for researchers in psychiatry and neuroscience to perform connectome-based predictive modeling. This package offers a comprehensive framework for building predictive models from structural and functional connectome data, with a strong focus on methodological rigor, interpretability, confound control, and statistical robustness.
"},{"location":"#background","title":"Background","text":"Network-based approaches are increasingly recognized as essential for understanding the complex relationships in brain connectivity that underlie behavior, cognition, and mental health. In psychiatry and neuroscience, analyzing structural and functional networks can reveal patterns associated with mental disorders, support individualized predictions, and improve our understanding of brain function. However, these analyses require robust tools that account for the unique challenges of connectome data, such as high dimensionality, variability, and the influence of confounding factors.
Despite the growing importance of connectome-based predictive modeling (CPM), there is currently no fully developed software package for performing these analyses. Existing options are limited to a few MATLAB scripts, which lack the flexibility, transparency, and rigor required to foster replicable research. CCCPM addresses this gap by providing a Python-based, flexible, and rigorously designed toolbox that encourages replicable analyses while allowing researchers to tailor their workflows to specific research questions.
"},{"location":"#overview","title":"Overview","text":"CCCPM was developed to address key challenges in connectome-based analyses, including optimizing model hyperparameters, controlling for confounding variables, and assessing the reliability of selected network features. This toolbox introduces novel methods, such as stability metrics for selected edges, and integrates well-established practices like nested cross-validation and permutation-based significance testing. By doing so, CCCPM provides a powerful and transparent tool for researchers aiming to explore brain networks' contributions to predictive models.
"},{"location":"#key-features","title":"Key Features","text":" - Hyperparameter Optimization: Fine-tune model parameters, such as p-thresholds for edge selection, to achieve better predictive performance.
- Confound Adjustment: Use partial correlation methods during edge selection to rigorously control for covariates and confounding variables.
- Residualization: Remove the influence of confounds from connectome strengths to ensure cleaner data inputs.
- Statistical Validation: Assess model and edge-level significance using permutation-based testing, ensuring that findings are statistically robust.
- Stability Metrics: Evaluate the reliability of selected edges across iterations, improving the interpretability and reproducibility of identified networks.
- Model Increment Analysis: Quantify the unique contribution of connectome data to predictive models, helping to clarify their added value in prediction tasks.
"},{"location":"#why-cccpm","title":"Why CCCPM?","text":"Unlike existing CPM implementations, which are limited in scope and flexibility, CCCPM is designed to foster rigorous and replicable research. Its Python-based architecture ensures accessibility and compatibility with modern data science workflows, while its features address the specific challenges of connectome-based analyses. By offering a robust and transparent framework, CCCPM enables researchers to conduct analyses that are not only flexible and customizable but also reproducible and scientifically sound.
"},{"location":"#features-in-detail","title":"Features in Detail","text":""},{"location":"#data-imputation","title":"Data Imputation","text":"CCCPM includes methods to handle missing data effectively, ensuring that datasets with incomplete connectome information can still be utilized without introducing biases.
"},{"location":"#nested-cross-validation","title":"Nested Cross-Validation","text":"A nested cross-validation scheme is implemented to separate hyperparameter tuning from model evaluation. This ensures that the reported model performance is unbiased and reflects its true generalization capability.
"},{"location":"#threshold-optimization","title":"Threshold Optimization","text":"The toolbox automates the optimization of p-thresholds, which determine which edges in the connectome are selected for model building. This allows researchers to identify thresholds that balance performance and interpretability.
"},{"location":"#confound-adjustment","title":"Confound Adjustment","text":"By implementing partial correlations, CCCPM allows researchers to account for confounding variables during edge selection, ensuring that identified networks represent genuine relationships rather than artifacts.
"},{"location":"#statistical-significance","title":"Statistical Significance","text":"Permutation-based testing is provided to evaluate the significance of both model performance and selected edges, adding rigor to findings and reducing the risk of false-positive results.
"},{"location":"#edge-stability","title":"Edge Stability","text":"CCCPM introduces a stability metric for selected edges, helping researchers evaluate the consistency of their findings across multiple iterations. This enhances the reliability of results and their potential for replication.
"},{"location":"#model-increment-analysis","title":"Model Increment Analysis","text":"Assess the added predictive value of connectome data by calculating the incremental contribution of network features to overall model performance.
"},{"location":"getting_started/","title":"Getting Started","text":"This guide will help you get started with running an analysis using the CPMRegression
class. It provides a step-by-step description of how to set up, configure, and execute an analysis, along with explanations of the inputs and parameters.
"},{"location":"getting_started/#step-1-prepare-your-data","title":"Step 1: Prepare Your Data","text":"To run an analysis, you need the following inputs:
- Connectome Data (
X
): A 2D array (numpy array or pandas DataFrame) of shape (n_samples, n_features)
containing connectome edge values for each subject. - Target Variable (
y
): A 1D array or pandas Series of shape (n_samples,)
containing the outcome variable (e.g., clinical scores, behavioral measures). - Covariates: A 2D array or pandas DataFrame of shape
(n_samples, n_covariates)
containing variables to control for (e.g., age, sex).
Ensure that all inputs have consistent sample sizes (n_samples
).
"},{"location":"getting_started/#step-2-configure-the-analysis","title":"Step 2: Configure the Analysis","text":""},{"location":"getting_started/#cross-validation","title":"Cross-Validation","text":"The CPMRegression
class uses an outer cross-validation loop for performance evaluation and an optional inner cross-validation loop for hyperparameter optimization.
- Outer CV (
cv
): Defines the cross-validation strategy (e.g., KFold
). - Inner CV (
inner_cv
): Used for optimizing hyperparameters during edge selection. Can be left as None
if not needed.
Example:
Pythonfrom sklearn.model_selection import KFold\n\nouter_cv = KFold(n_splits=10, shuffle=True, random_state=42)\n
"},{"location":"getting_started/#edge-selection","title":"Edge Selection","text":"The toolbox implements univariate edge selection, allowing users to specify the method for evaluating and selecting edges based on statistical tests.
"},{"location":"getting_started/#edge-statistics","title":"Edge Statistics","text":"Choose from the following methods for computing edge statistics:
- pearson: Pearson correlation
- pearson_partial: Pearson partial correlation (controlling for covariates)
- spearman: Spearman rank correlation
- spearman_partial: Spearman partial correlation (controlling for covariates)
"},{"location":"getting_started/#p-thresholds","title":"p-Thresholds","text":" - Set a single value (e.g., 0.05) or provide multiple values (e.g., [0.01, 0.05, 0.1]).
- If multiple thresholds are specified, the toolbox will optimize for the best p-threshold during inner cross-validation.
"},{"location":"getting_started/#fdr-correction","title":"FDR Correction","text":" - Optional FDR correction for multiple comparisons can be applied using correction='fdr_by'.
Example:
Pythonfrom cpm.edge_selection import UnivariateEdgeSelection, PThreshold\n\nedge_statistic = 'pearson'\nunivariate_edge_selection = UnivariateEdgeSelection(\n edge_statistic=[edge_statistic],\n edge_selection=[PThreshold(threshold=[0.05], correction=['fdr_by'])]\n)\n
"},{"location":"getting_started/#step-3-set-up-the-cpmregression-object","title":"Step 3: Set Up the CPMRegression Object","text":"Create an instance of the CPMRegression class with the required inputs:
Pythonfrom cpm.cpm_analysis import CPMRegression\n\ncpm = CPMRegression(\n results_directory=\"results/\",\n cv=outer_cv,\n inner_cv=inner_cv, # Optional\n edge_selection=univariate_edge_selection,\n select_stable_edges=True,\n stability_threshold=0.8,\n impute_missing_values=True,\n n_permutations=100\n)\n
"},{"location":"getting_started/#key-parameters","title":"Key Parameters","text":" - results_directory: Directory where results will be saved.
- cv: Outer cross-validation strategy.
- inner_cv: Inner cross-validation strategy for hyperparameter optimization (optional).
- edge_selection: Configuration for univariate edge selection.
- select_stable_edges: Whether to select stable edges across folds (True or False).
- stability_threshold: Minimum proportion of folds in which an edge must be selected to be considered stable.
- impute_missing_values: Whether to impute missing values (True or False).
- n_permutations: Number of permutations for permutation testing.
"},{"location":"getting_started/#step-4-run-the-analysis","title":"Step 4: Run the Analysis","text":"Call the estimate method to perform the analysis:
PythonX = ... # Load your connectome data (numpy array or pandas DataFrame)\ny = ... # Load your target variable (numpy array or pandas Series)\ncovariates = ... # Load your covariates (numpy array or pandas DataFrame)\n\ncpm.estimate(X=X, y=y, covariates=covariates)\n
This will:
- Perform edge selection based on the specified method and thresholds.
- Train and evaluate models for each cross-validation fold.
- Save results, including predictions, metrics, and permutation-based significance tests, to the results_directory.
"},{"location":"getting_started/#step-5-review-results","title":"Step 5: Review Results","text":"After the analysis, you can find the results in the results_directory, including:
- Cross-validation metrics (e.g., mean absolute error, R\u00b2).
- Model predictions for each fold.
- Edge stability and significance.
You can load and inspect these results for further analysis.
By following these steps, you can quickly set up and execute a connectome-based predictive modeling analysis using the CPMRegression class. For further customization, refer to the API documentation.
"},{"location":"installation/","title":"Installation Guide","text":"Follow these steps to install the ccCPM Python package directly from GitHub.
"},{"location":"installation/#prerequisites","title":"Prerequisites","text":" - tested with Python 3.10 or later
pip
(Python's package manager)
"},{"location":"installation/#installation-steps","title":"Installation Steps","text":"Clone the GitHub repository:
Bashgit clone https://github.com/mmll/cpm_python.git\n
Navigate to the repository directory:
Bashcd cpm_python\n
Install the package:
Bashpip install .\n
To install in development mode, use:
Bashpip install -e .\n
Verify the installation:
Pythonimport cpm\nprint(cpm.__version__)\n
You should see the package version printed without errors.
"},{"location":"api/cpm_regression/","title":"CPM Regression","text":""},{"location":"api/cpm_regression/#cpm.cpm_analysis.CPMRegression","title":"CPMRegression
","text":"This class handles the process of performing CPM Regression with cross-validation and permutation testing.
"},{"location":"api/cpm_regression/#cpm.cpm_analysis.CPMRegression.__init__","title":"__init__(results_directory, cv=KFold(n_splits=10, shuffle=True, random_state=42), inner_cv=None, edge_selection=UnivariateEdgeSelection(edge_statistic=['pearson'], edge_selection=[PThreshold(threshold=[0.05], correction=[None])]), select_stable_edges=True, stability_threshold=0.8, impute_missing_values=True, n_permutations=0, atlas_labels=None)
","text":"Initialize the CPMRegression object.
Parameters:
Name Type Description Default results_directory
str
Directory to save results.
required cv
Union[BaseCrossValidator, BaseShuffleSplit]
Outer cross-validation strategy.
KFold(n_splits=10, shuffle=True, random_state=42)
inner_cv
Union[BaseCrossValidator, BaseShuffleSplit]
Inner cross-validation strategy for edge selection.
None
edge_selection
UnivariateEdgeSelection
Method for edge selection.
UnivariateEdgeSelection(edge_statistic=['pearson'], edge_selection=[PThreshold(threshold=[0.05], correction=[None])])
impute_missing_values
bool
Whether to impute missing values.
True
n_permutations
int
Number of permutations to run for permutation testing.
0
atlas_labels
str
CSV file containing atlas and regions labels.
None
"},{"location":"api/cpm_regression/#cpm.cpm_analysis.CPMRegression.calculate_p_values","title":"calculate_p_values(true_results, perms)
staticmethod
","text":"Calculate p-values based on true results and permutation results.
:param true_results: DataFrame with the true results. :param perms: DataFrame with the permutation results. :return: DataFrame with the calculated p-values.
"},{"location":"api/cpm_regression/#cpm.cpm_analysis.CPMRegression.estimate","title":"estimate(X, y, covariates)
","text":"Estimates a model using the provided data and conducts permutation testing. This method first fits the model to the actual data and subsequently performs estimation on permuted data for a specified number of permutations. Finally, it calculates permutation results.
Parameters:
Name Type Description Default X
Union[DataFrame, ndarray]
required y
Union[Series, DataFrame, ndarray]
required covariates
Union[Series, DataFrame, ndarray]
required"},{"location":"api/cpm_regression/#cpm.cpm_analysis.CPMRegression.load_configuration","title":"load_configuration(results_directory, config_filename)
","text":"Load configuration from a file.
:param results_directory: Directory to set for results. :param config_filename: Path to the configuration file.
"},{"location":"api/cpm_regression/#cpm.cpm_analysis.CPMRegression.save_configuration","title":"save_configuration(config_filename)
","text":"Saves the current configuration settings to a file in Pickle format. All attributes related to the configuration of the object are serialized and stored in a file with the same base name as the provided filename, but with a .pkl extension.
:param config_filename: The base name of the file where the configuration will be saved. :return: None
"},{"location":"api/edge_selection/","title":"Edge Selection","text":""},{"location":"api/edge_selection/#cpm.edge_selection.PThreshold","title":"PThreshold
","text":" Bases: BaseEdgeSelector
"},{"location":"api/edge_selection/#cpm.edge_selection.PThreshold.__init__","title":"__init__(threshold=0.05, correction=None)
","text":":param threshold: :param correction: can be one of statsmodels methods bonferroni : one-step correction sidak : one-step correction holm-sidak : step down method using Sidak adjustments holm : step-down method using Bonferroni adjustments simes-hochberg : step-up method (independent) hommel : closed method based on Simes tests (non-negative) fdr_bh : Benjamini/Hochberg (non-negative) fdr_by : Benjamini/Yekutieli (negative) fdr_tsbh : two stage fdr correction (non-negative) fdr_tsbky : two stage fdr correction (non-negative)
"},{"location":"api/fold/","title":"Fold","text":""},{"location":"api/models/","title":"Predictive Models","text":""},{"location":"api/models/#cpm.models.LinearCPMModel","title":"LinearCPMModel
","text":"Linear Connectome-based Predictive Modeling (CPM) implementation.
This class implements a linear CPM model, allowing for fitting and prediction based on connectome data, covariates, and residuals.
Attributes:
Name Type Description models
ModelDict
A dictionary containing the fitted models for different networks and data types (connectome, covariates, residuals, and full model).
models_residuals
dict
A dictionary storing linear regression models used to calculate residuals for connectome data, controlling for covariates.
edges
dict
A dictionary defining the edges (features) used for each network (e.g., 'positive', 'negative').
Parameters:
Name Type Description Default edges
dict
Dictionary containing indices of edges for 'positive' and 'negative' networks.
required"},{"location":"api/models/#cpm.models.LinearCPMModel.__init__","title":"__init__(edges)
","text":"Initialize the LinearCPMModel.
Parameters:
Name Type Description Default edges
dict
Dictionary containing indices of edges for 'positive' and 'negative' networks.
required"},{"location":"api/models/#cpm.models.LinearCPMModel.fit","title":"fit(X, y, covariates)
","text":"Fit the CPM model.
This method fits multiple linear regression models for the connectome, covariates, residuals, and full model using the provided data.
Parameters:
Name Type Description Default X
ndarray
A 2D array of shape (n_samples, n_features) representing the connectome data.
required y
ndarray
A 1D array of shape (n_samples,) representing the target variable.
required covariates
ndarray
A 2D array of shape (n_samples, n_covariates) representing the covariates.
required Returns:
Type Description LinearCPMModel
The fitted CPM model instance.
"},{"location":"api/models/#cpm.models.LinearCPMModel.predict","title":"predict(X, covariates)
","text":"Predict using the fitted CPM model.
This method generates predictions for the target variable using the connectome, covariates, residuals, and full models.
Parameters:
Name Type Description Default X
ndarray
A 2D array of shape (n_samples, n_features) representing the connectome data.
required covariates
ndarray
A 2D array of shape (n_samples, n_covariates) representing the covariates.
required Returns:
Type Description ModelDict
A dictionary containing predictions for each network and model type (connectome, covariates, residuals, and full model).
"},{"location":"examples/human_connectome_project/","title":"HCP","text":"Python"},{"location":"examples/simulated_data/","title":"Simulated Data","text":"Pythonfrom sklearn.model_selection import KFold, ShuffleSplit\n\nfrom cpm import CPMRegression\nfrom cpm.simulate_data import simulate_regression_data\nfrom cpm.edge_selection import PThreshold, UnivariateEdgeSelection\n\n\nX, y, covariates = simulate_regression_data(n_features=1225, n_informative_features=50,\n covariate_effect_size=0.2,\n feature_effect_size=100,\n noise_level=0.1)\n\nunivariate_edge_selection = UnivariateEdgeSelection(edge_statistic=['pearson'],\n edge_selection=[PThreshold(threshold=[0.05],\n correction=[None])])\ncpm = CPMRegression(results_directory='./tmp/example_simulated_data2',\n cv=KFold(n_splits=5, shuffle=True, random_state=42),\n edge_selection=univariate_edge_selection,\n #cv_edge_selection=ShuffleSplit(n_splits=1, test_size=0.2, random_state=42),\n add_edge_filter=True,\n n_permutations=10)\ncpm.estimate(X=X, y=y, covariates=covariates)\n\n#cpm._calculate_permutation_results('./tmp/example_simulated_data2')\n
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":""},{"location":"#confound-corrected-connectome-based-predictive-modeling-cccpm","title":"Confound-Corrected Connectome-Based Predictive Modeling (CCCPM)","text":"CCCPM is a newly developed Python toolbox designed specifically for researchers in psychiatry and neuroscience to perform connectome-based predictive modeling. This package offers a comprehensive framework for building predictive models from structural and functional connectome data, with a strong focus on methodological rigor, interpretability, confound control, and statistical robustness.
"},{"location":"#background","title":"Background","text":"Network-based approaches are increasingly recognized as essential for understanding the complex relationships in brain connectivity that underlie behavior, cognition, and mental health. In psychiatry and neuroscience, analyzing structural and functional networks can reveal patterns associated with mental disorders, support individualized predictions, and improve our understanding of brain function. However, these analyses require robust tools that account for the unique challenges of connectome data, such as high dimensionality, variability, and the influence of confounding factors.
Despite the growing importance of connectome-based predictive modeling (CPM), there is currently no fully developed software package for performing these analyses. Existing options are limited to a few MATLAB scripts, which lack the flexibility, transparency, and rigor required to foster replicable research. CCCPM addresses this gap by providing a Python-based, flexible, and rigorously designed toolbox that encourages replicable analyses while allowing researchers to tailor their workflows to specific research questions.
"},{"location":"#overview","title":"Overview","text":"CCCPM was developed to address key challenges in connectome-based analyses, including optimizing model hyperparameters, controlling for confounding variables, and assessing the reliability of selected network features. This toolbox introduces novel methods, such as stability metrics for selected edges, and integrates well-established practices like nested cross-validation and permutation-based significance testing. By doing so, CCCPM provides a powerful and transparent tool for researchers aiming to explore brain networks' contributions to predictive models.
"},{"location":"#key-features","title":"Key Features","text":" - Hyperparameter Optimization: Fine-tune model parameters, such as p-thresholds for edge selection, to achieve better predictive performance.
- Confound Adjustment: Use partial correlation methods during edge selection to rigorously control for covariates and confounding variables.
- Residualization: Remove the influence of confounds from connectome strengths to ensure cleaner data inputs.
- Statistical Validation: Assess model and edge-level significance using permutation-based testing, ensuring that findings are statistically robust.
- Stability Metrics: Evaluate the reliability of selected edges across iterations, improving the interpretability and reproducibility of identified networks.
- Model Increment Analysis: Quantify the unique contribution of connectome data to predictive models, helping to clarify their added value in prediction tasks.
"},{"location":"#why-cccpm","title":"Why CCCPM?","text":"Unlike existing CPM implementations, which are limited in scope and flexibility, CCCPM is designed to foster rigorous and replicable research. Its Python-based architecture ensures accessibility and compatibility with modern data science workflows, while its features address the specific challenges of connectome-based analyses. By offering a robust and transparent framework, CCCPM enables researchers to conduct analyses that are not only flexible and customizable but also reproducible and scientifically sound.
"},{"location":"#features-in-detail","title":"Features in Detail","text":""},{"location":"#data-imputation","title":"Data Imputation","text":"CCCPM includes methods to handle missing data effectively, ensuring that datasets with incomplete connectome information can still be utilized without introducing biases.
"},{"location":"#nested-cross-validation","title":"Nested Cross-Validation","text":"A nested cross-validation scheme is implemented to separate hyperparameter tuning from model evaluation. This ensures that the reported model performance is unbiased and reflects its true generalization capability.
"},{"location":"#threshold-optimization","title":"Threshold Optimization","text":"The toolbox automates the optimization of p-thresholds, which determine which edges in the connectome are selected for model building. This allows researchers to identify thresholds that balance performance and interpretability.
"},{"location":"#confound-adjustment","title":"Confound Adjustment","text":"By implementing partial correlations, CCCPM allows researchers to account for confounding variables during edge selection, ensuring that identified networks represent genuine relationships rather than artifacts.
"},{"location":"#statistical-significance","title":"Statistical Significance","text":"Permutation-based testing is provided to evaluate the significance of both model performance and selected edges, adding rigor to findings and reducing the risk of false-positive results.
"},{"location":"#edge-stability","title":"Edge Stability","text":"CCCPM introduces a stability metric for selected edges, helping researchers evaluate the consistency of their findings across multiple iterations. This enhances the reliability of results and their potential for replication.
"},{"location":"#model-increment-analysis","title":"Model Increment Analysis","text":"Assess the added predictive value of connectome data by calculating the incremental contribution of network features to overall model performance.
"},{"location":"getting_started/","title":"Getting Started","text":"This guide will help you get started with running an analysis using the CPMRegression
class. It provides a step-by-step description of how to set up, configure, and execute an analysis, along with explanations of the inputs and parameters.
"},{"location":"getting_started/#step-1-prepare-your-data","title":"Step 1: Prepare Your Data","text":"To run an analysis, you need the following inputs:
- Connectome Data (
X
): A 2D array (numpy array or pandas DataFrame) of shape (n_samples, n_features)
containing connectome edge values for each subject. - Target Variable (
y
): A 1D array or pandas Series of shape (n_samples,)
containing the outcome variable (e.g., clinical scores, behavioral measures). - Covariates: A 2D array or pandas DataFrame of shape
(n_samples, n_covariates)
containing variables to control for (e.g., age, sex).
Ensure that all inputs have consistent sample sizes (n_samples
).
"},{"location":"getting_started/#step-2-configure-the-analysis","title":"Step 2: Configure the Analysis","text":""},{"location":"getting_started/#cross-validation","title":"Cross-Validation","text":"The CPMRegression
class uses an outer cross-validation loop for performance evaluation and an optional inner cross-validation loop for hyperparameter optimization.
- Outer CV (
cv
): Defines the cross-validation strategy (e.g., KFold
). - Inner CV (
inner_cv
): Used for optimizing hyperparameters during edge selection. Can be left as None
if not needed.
Example:
Pythonfrom sklearn.model_selection import KFold\n\nouter_cv = KFold(n_splits=10, shuffle=True, random_state=42)\n
"},{"location":"getting_started/#edge-selection","title":"Edge Selection","text":"The toolbox implements univariate edge selection, allowing users to specify the method for evaluating and selecting edges based on statistical tests.
"},{"location":"getting_started/#edge-statistics","title":"Edge Statistics","text":"Choose from the following methods for computing edge statistics:
- pearson: Pearson correlation
- pearson_partial: Pearson partial correlation (controlling for covariates)
- spearman: Spearman rank correlation
- spearman_partial: Spearman partial correlation (controlling for covariates)
"},{"location":"getting_started/#p-thresholds","title":"p-Thresholds","text":" - Set a single value (e.g., 0.05) or provide multiple values (e.g., [0.01, 0.05, 0.1]).
- If multiple thresholds are specified, the toolbox will optimize for the best p-threshold during inner cross-validation.
"},{"location":"getting_started/#fdr-correction","title":"FDR Correction","text":" - Optional FDR correction for multiple comparisons can be applied using correction='fdr_by'.
Example:
Pythonfrom cpm.edge_selection import UnivariateEdgeSelection, PThreshold\n\nedge_statistic = 'pearson'\nunivariate_edge_selection = UnivariateEdgeSelection(\n edge_statistic=[edge_statistic],\n edge_selection=[PThreshold(threshold=[0.05], correction=['fdr_by'])]\n)\n
"},{"location":"getting_started/#step-3-set-up-the-cpmregression-object","title":"Step 3: Set Up the CPMRegression Object","text":"Create an instance of the CPMRegression class with the required inputs:
Pythonfrom cpm.cpm_analysis import CPMRegression\n\ncpm = CPMRegression(\n results_directory=\"results/\",\n cv=outer_cv,\n inner_cv=inner_cv, # Optional\n edge_selection=univariate_edge_selection,\n select_stable_edges=True,\n stability_threshold=0.8,\n impute_missing_values=True,\n n_permutations=100\n)\n
"},{"location":"getting_started/#key-parameters","title":"Key Parameters","text":" - results_directory: Directory where results will be saved.
- cv: Outer cross-validation strategy.
- inner_cv: Inner cross-validation strategy for hyperparameter optimization (optional).
- edge_selection: Configuration for univariate edge selection.
- select_stable_edges: Whether to select stable edges across folds (True or False).
- stability_threshold: Minimum proportion of folds in which an edge must be selected to be considered stable.
- impute_missing_values: Whether to impute missing values (True or False).
- n_permutations: Number of permutations for permutation testing.
"},{"location":"getting_started/#step-4-run-the-analysis","title":"Step 4: Run the Analysis","text":"Call the estimate method to perform the analysis:
PythonX = ... # Load your connectome data (numpy array or pandas DataFrame)\ny = ... # Load your target variable (numpy array or pandas Series)\ncovariates = ... # Load your covariates (numpy array or pandas DataFrame)\n\ncpm.estimate(X=X, y=y, covariates=covariates)\n
This will:
- Perform edge selection based on the specified method and thresholds.
- Train and evaluate models for each cross-validation fold.
- Save results, including predictions, metrics, and permutation-based significance tests, to the results_directory.
"},{"location":"getting_started/#step-5-review-results","title":"Step 5: Review Results","text":"After the analysis, you can find the results in the results_directory, including:
- Cross-validation metrics (e.g., mean absolute error, R\u00b2).
- Model predictions for each fold.
- Edge stability and significance.
You can load and inspect these results for further analysis.
By following these steps, you can quickly set up and execute a connectome-based predictive modeling analysis using the CPMRegression class. For further customization, refer to the API documentation.
"},{"location":"installation/","title":"Installation Guide","text":"Follow these steps to install the ccCPM Python package directly from GitHub.
"},{"location":"installation/#prerequisites","title":"Prerequisites","text":" - tested with Python 3.10 or later
pip
(Python's package manager)
"},{"location":"installation/#installation-steps","title":"Installation Steps","text":"Clone the GitHub repository:
Bashgit clone https://github.com/mmll/cpm_python.git\n
Navigate to the repository directory:
Bashcd cpm_python\n
Install the package:
Bashpip install .\n
To install in development mode, use:
Bashpip install -e .\n
Verify the installation:
Pythonimport cpm\nprint(cpm.__version__)\n
You should see the package version printed without errors.
"},{"location":"api/cpm_regression/","title":"CPM Regression","text":""},{"location":"api/cpm_regression/#cpm.cpm_analysis.CPMRegression","title":"CPMRegression
","text":"This class handles the process of performing CPM Regression with cross-validation and permutation testing.
"},{"location":"api/cpm_regression/#cpm.cpm_analysis.CPMRegression.__init__","title":"__init__(results_directory, cv=KFold(n_splits=10, shuffle=True, random_state=42), inner_cv=None, edge_selection=UnivariateEdgeSelection(edge_statistic=['pearson'], edge_selection=[PThreshold(threshold=[0.05], correction=[None])]), select_stable_edges=True, stability_threshold=0.8, impute_missing_values=True, n_permutations=0, atlas_labels=None)
","text":"Initialize the CPMRegression object.
Parameters:
Name Type Description Default results_directory
str
Directory to save results.
required cv
Union[BaseCrossValidator, BaseShuffleSplit]
Outer cross-validation strategy.
KFold(n_splits=10, shuffle=True, random_state=42)
inner_cv
Union[BaseCrossValidator, BaseShuffleSplit]
Inner cross-validation strategy for edge selection.
None
edge_selection
UnivariateEdgeSelection
Method for edge selection.
UnivariateEdgeSelection(edge_statistic=['pearson'], edge_selection=[PThreshold(threshold=[0.05], correction=[None])])
impute_missing_values
bool
Whether to impute missing values.
True
n_permutations
int
Number of permutations to run for permutation testing.
0
atlas_labels
str
CSV file containing atlas and regions labels.
None
"},{"location":"api/cpm_regression/#cpm.cpm_analysis.CPMRegression.calculate_p_values","title":"calculate_p_values(true_results, perms)
staticmethod
","text":"Calculate p-values based on true results and permutation results.
:param true_results: DataFrame with the true results. :param perms: DataFrame with the permutation results. :return: DataFrame with the calculated p-values.
"},{"location":"api/cpm_regression/#cpm.cpm_analysis.CPMRegression.estimate","title":"estimate(X, y, covariates)
","text":"Estimates a model using the provided data and conducts permutation testing. This method first fits the model to the actual data and subsequently performs estimation on permuted data for a specified number of permutations. Finally, it calculates permutation results.
Parameters:
Name Type Description Default X
Union[DataFrame, ndarray]
required y
Union[Series, DataFrame, ndarray]
required covariates
Union[Series, DataFrame, ndarray]
required"},{"location":"api/cpm_regression/#cpm.cpm_analysis.CPMRegression.load_configuration","title":"load_configuration(results_directory, config_filename)
","text":"Load configuration from a file.
:param results_directory: Directory to set for results. :param config_filename: Path to the configuration file.
"},{"location":"api/cpm_regression/#cpm.cpm_analysis.CPMRegression.save_configuration","title":"save_configuration(config_filename)
","text":"Saves the current configuration settings to a file in Pickle format. All attributes related to the configuration of the object are serialized and stored in a file with the same base name as the provided filename, but with a .pkl extension.
:param config_filename: The base name of the file where the configuration will be saved. :return: None
"},{"location":"api/edge_selection/","title":"Edge Selection","text":""},{"location":"api/edge_selection/#cpm.edge_selection.PThreshold","title":"PThreshold
","text":" Bases: BaseEdgeSelector
"},{"location":"api/edge_selection/#cpm.edge_selection.PThreshold.__init__","title":"__init__(threshold=0.05, correction=None)
","text":":param threshold: :param correction: can be one of statsmodels methods bonferroni : one-step correction sidak : one-step correction holm-sidak : step down method using Sidak adjustments holm : step-down method using Bonferroni adjustments simes-hochberg : step-up method (independent) hommel : closed method based on Simes tests (non-negative) fdr_bh : Benjamini/Hochberg (non-negative) fdr_by : Benjamini/Yekutieli (negative) fdr_tsbh : two stage fdr correction (non-negative) fdr_tsbky : two stage fdr correction (non-negative)
"},{"location":"api/fold/","title":"Fold","text":""},{"location":"api/models/","title":"Predictive Models","text":""},{"location":"api/models/#cpm.models.LinearCPMModel","title":"LinearCPMModel
","text":"Linear Connectome-based Predictive Modeling (CPM) implementation.
This class implements a linear CPM model, allowing for fitting and prediction based on connectome data, covariates, and residuals.
Attributes:
Name Type Description models
ModelDict
A dictionary containing the fitted models for different networks and data types (connectome, covariates, residuals, and full model).
models_residuals
dict
A dictionary storing linear regression models used to calculate residuals for connectome data, controlling for covariates.
edges
dict
A dictionary defining the edges (features) used for each network (e.g., 'positive', 'negative').
Parameters:
Name Type Description Default edges
dict
Dictionary containing indices of edges for 'positive' and 'negative' networks.
required"},{"location":"api/models/#cpm.models.LinearCPMModel.__init__","title":"__init__(edges)
","text":"Initialize the LinearCPMModel.
Parameters:
Name Type Description Default edges
dict
Dictionary containing indices of edges for 'positive' and 'negative' networks.
required"},{"location":"api/models/#cpm.models.LinearCPMModel.fit","title":"fit(X, y, covariates)
","text":"Fit the CPM model.
This method fits multiple linear regression models for the connectome, covariates, residuals, and full model using the provided data.
Parameters:
Name Type Description Default X
ndarray
A 2D array of shape (n_samples, n_features) representing the connectome data.
required y
ndarray
A 1D array of shape (n_samples,) representing the target variable.
required covariates
ndarray
A 2D array of shape (n_samples, n_covariates) representing the covariates.
required Returns:
Type Description LinearCPMModel
The fitted CPM model instance.
"},{"location":"api/models/#cpm.models.LinearCPMModel.predict","title":"predict(X, covariates)
","text":"Predict using the fitted CPM model.
This method generates predictions for the target variable using the connectome, covariates, residuals, and full models.
Parameters:
Name Type Description Default X
ndarray
A 2D array of shape (n_samples, n_features) representing the connectome data.
required covariates
ndarray
A 2D array of shape (n_samples, n_covariates) representing the covariates.
required Returns:
Type Description ModelDict
A dictionary containing predictions for each network and model type (connectome, covariates, residuals, and full model).
"},{"location":"examples/human_connectome_project/","title":"HCP","text":"Python"},{"location":"examples/simulated_data/","title":"Simulated Data","text":"Python"}]}
\ No newline at end of file