Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colocalization of two proteins #29

Open
dariushghasemi opened this issue Dec 3, 2024 · 4 comments
Open

Colocalization of two proteins #29

dariushghasemi opened this issue Dec 3, 2024 · 4 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@dariushghasemi
Copy link
Contributor

Here we perform colocalization analysis using the coloc v5.2.3 (2023-09-22).
We follow some steps here:

  • we cross tabulate credible set SNPs for two independent SNPs for two different proteins,
  • once there is s common SNP between the two credible sets, we store corresponding conditional data of the two independent SNPs.
  • repeat this analysis for all combinations of independent signals from one proteins against all other proteins to attain all possible combinations of the traits we need to perform coloc,
  • conduct main colocalization analysis of two traits using the input created in previous steps.
@dariushghasemi dariushghasemi self-assigned this Dec 3, 2024
@dariushghasemi dariushghasemi added the documentation Improvements or additions to documentation label Dec 3, 2024
dariushghasemi added a commit that referenced this issue Dec 5, 2024
fix definition of input and output files
@dariushghasemi
Copy link
Contributor Author

The coloc analysis with previous version of the Nicola's pipe has been run successfully. Here we try to update the function computing l-ABF to the latest version removing NULL from the conditional dataset.

dariushghasemi added a commit that referenced this issue Jan 8, 2025
remove NULL assumption from conditional dataset for each locus
dariushghasemi added a commit that referenced this issue Jan 8, 2025
posterior probablities are now directly computed by coloc package instead of manipulating l-ABFs using built-in functions of coloc
@dariushghasemi
Copy link
Contributor Author

We combined coloc info tables using R script workflow/scripts/s06_collect_info.R rather than in bash. The scripts:

  • scans all the files (RDS/TSV/sentinel etc.) created by rule run_cojo stored in results/*/cojo/
  • takes info tables (TSV)
  • for the input seqids (extracted from input sentinel files), selects info table for input seqids from all present COJO outputs
  • reads and merges coloc info tables
  • extracts chromosomal position from file names which helps iterate rule run_coloc in parallel for each chromosome.

The input/output/parameter directives of rule master_file. are changed accordingly.

dariushghasemi added a commit that referenced this issue Jan 15, 2025
no need to chr column; it is created in master coloc rule
dariushghasemi added a commit that referenced this issue Jan 15, 2025
@dariushghasemi
Copy link
Contributor Author

In the next commit, we allocated resources in SLURM configuration slurm/config.yaml for each of the three rules running colocalization analysis in Snakemake.

@dariushghasemi
Copy link
Contributor Author

In the following commit, we ran colocalization analysis on all proteins sequences with the current parameters in config file eg. hole=3Mb. Some statistics about the results of this preliminary version of coloc pipeline are:

  • size: 119 Mb
  • nrows: 593,014
  • ncols: 12
  • n. of unique loci: 3367
  • n. of unique seqids: 3720

We save the pipe output in this directory:
/exchange/healthds/pQTL/results/META_CHRIS_INTERVAL/qced_sumstats_digits_not_flipped_filtered/14-Jan-25_combined_colocalization_results.csv

Along with the coloc results, a READMe is also provided to briefly describe method, output column names, and config parameters at the above directory.

Column Description
nsnps number of variants shared between the two input loci
PP.H0.abf posterior probabality for hypothesis 0: No causal variant
PP.H1.abf posterior probabality for hypothesis 1: causal variant for t1
PP.H2.abf posterior probabality for hypothesis 2: causal variant for t2
PP.H3.abf posterior probabality for hypothesis 3: distinct causal variants for t1 and t2
PP.H4.abf posterior probabality for hypothesis 4: shared causal variant for t1 and t2
t1 trait name of protein_1 (seqid)
t2 trait name of protein_2 (seqid)
target1 the most significant variant at locus_1
target2 the most significant variant at locus_2
locus1 locus belonging to protein_1
locus2 locus belonging to protein_2

NOTE: This analysis results are preliminary and may change later.

dariushghasemi added a commit that referenced this issue Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant