Colocalization of two proteins #29

dariushghasemi · 2024-12-03T17:44:17Z

Here we perform colocalization analysis using the coloc v5.2.3 (2023-09-22).
We follow some steps here:

we cross tabulate credible set SNPs for two independent SNPs for two different proteins,
once there is s common SNP between the two credible sets, we store corresponding conditional data of the two independent SNPs.
repeat this analysis for all combinations of independent signals from one proteins against all other proteins to attain all possible combinations of the traits we need to perform coloc,
conduct main colocalization analysis of two traits using the input created in previous steps.

fix definition of input and output files

dariushghasemi · 2024-12-09T17:20:17Z

The coloc analysis with previous version of the Nicola's pipe has been run successfully. Here we try to update the function computing l-ABF to the latest version removing NULL from the conditional dataset.

remove NULL assumption from conditional dataset for each locus

posterior probablities are now directly computed by coloc package instead of manipulating l-ABFs using built-in functions of coloc

dariushghasemi · 2025-01-15T13:08:46Z

We combined coloc info tables using R script workflow/scripts/s06_collect_info.R rather than in bash. The scripts:

scans all the files (RDS/TSV/sentinel etc.) created by rule run_cojo stored in results/*/cojo/
takes info tables (TSV)
for the input seqids (extracted from input sentinel files), selects info table for input seqids from all present COJO outputs
reads and merges coloc info tables
extracts chromosomal position from file names which helps iterate rule run_coloc in parallel for each chromosome.

The input/output/parameter directives of rule master_file. are changed accordingly.

no need to chr column; it is created in master coloc rule

dariushghasemi · 2025-01-15T13:32:24Z

In the next commit, we allocated resources in SLURM configuration slurm/config.yaml for each of the three rules running colocalization analysis in Snakemake.

dariushghasemi · 2025-01-15T14:20:13Z

In the following commit, we ran colocalization analysis on all proteins sequences with the current parameters in config file eg. hole=3Mb. Some statistics about the results of this preliminary version of coloc pipeline are:

size: 119 Mb
nrows: 593,014
ncols: 12
n. of unique loci: 3367
n. of unique seqids: 3720

We save the pipe output in this directory:
/exchange/healthds/pQTL/results/META_CHRIS_INTERVAL/qced_sumstats_digits_not_flipped_filtered/14-Jan-25_combined_colocalization_results.csv

Along with the coloc results, a READMe is also provided to briefly describe method, output column names, and config parameters at the above directory.

Column	Description
nsnps	number of variants shared between the two input loci
PP.H0.abf	posterior probabality for hypothesis 0: No causal variant
PP.H1.abf	posterior probabality for hypothesis 1: causal variant for t1
PP.H2.abf	posterior probabality for hypothesis 2: causal variant for t2
PP.H3.abf	posterior probabality for hypothesis 3: distinct causal variants for t1 and t2
PP.H4.abf	posterior probabality for hypothesis 4: shared causal variant for t1 and t2
t1	trait name of protein_1 (seqid)
t2	trait name of protein_2 (seqid)
target1	the most significant variant at locus_1
target2	the most significant variant at locus_2
locus1	locus belonging to protein_1
locus2	locus belonging to protein_2

NOTE: This analysis results are preliminary and may change later.

setting hole=3M in config

dariushghasemi self-assigned this Dec 3, 2024

dariushghasemi added the documentation Improvements or additions to documentation label Dec 3, 2024

dariushghasemi added a commit that referenced this issue Dec 4, 2024

adding rule to combine coloc info tables #29

3bc1149

dariushghasemi added a commit that referenced this issue Dec 5, 2024

add script to find overlapping credible set snps #29

a3ec8a4

fix definition of input and output files

dariushghasemi added a commit that referenced this issue Dec 5, 2024

defining sentinel files as output #29

88daa42

dariushghasemi added a commit that referenced this issue Dec 10, 2024

successfully run old coloc including NULL hypothesis #29

1553edf

dariushghasemi added a commit that referenced this issue Dec 11, 2024

adding script to run colocalization with coloc package #29

e384bb5

dariushghasemi added a commit that referenced this issue Jan 8, 2025

replace coloc.abf with abf_NO_PRIOR in coloc fine-mapping #29

f7da264

remove NULL assumption from conditional dataset for each locus

dariushghasemi added a commit that referenced this issue Jan 8, 2025

directly use coloc.abf for colocalization of two proteins #29

46eb095

posterior probablities are now directly computed by coloc package instead of manipulating l-ABFs using built-in functions of coloc

dariushghasemi added a commit that referenced this issue Jan 15, 2025

combine coloc info tables in R not bash #29

30fb464

dariushghasemi added a commit that referenced this issue Jan 15, 2025

read master coloc info table with headers=T #29

9d147d7

no need to chr column; it is created in master coloc rule

dariushghasemi added a commit that referenced this issue Jan 15, 2025

conda environment for coloc #29

c5455ca

dariushghasemi added a commit that referenced this issue Jan 15, 2025

reorder R script running coloc analysis #29

b10c816

dariushghasemi added a commit that referenced this issue Jan 15, 2025

reorder R script running coloc in Snakefile #29

c2b12f7

dariushghasemi added a commit that referenced this issue Jan 15, 2025

allocate resources to colocalization analysis #29

4c476e1

dariushghasemi added a commit that referenced this issue Jan 15, 2025

path to two example proteins for colocalization test #29

ef26736

dariushghasemi added a commit that referenced this issue Jan 15, 2025

run colocalization analysis on all proteins #29

21cd2a2

setting hole=3M in config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Colocalization of two proteins #29

Colocalization of two proteins #29

dariushghasemi commented Dec 3, 2024

dariushghasemi commented Dec 9, 2024

dariushghasemi commented Jan 15, 2025

dariushghasemi commented Jan 15, 2025

dariushghasemi commented Jan 15, 2025

Colocalization of two proteins #29

Colocalization of two proteins #29

Comments

dariushghasemi commented Dec 3, 2024

dariushghasemi commented Dec 9, 2024

dariushghasemi commented Jan 15, 2025

dariushghasemi commented Jan 15, 2025

dariushghasemi commented Jan 15, 2025