SLALOM (suspicious loci analysis of meta-analysis summary statistics) is a summary statistics-based QC method that identifies suspicious loci for meta-analysis fine-mapping by detecting association statistics outliers based on local LD structure. SLALOM only takes GWAS summary statistics and ancestry-matched external LD reference (e.g., gnomAD) as input and predicts whether each locus shows a suspicious pattern that called into question fine-mapping accuracy. The outlier detection was built upon the simplified version of the DENTIST method.
Analysis and figure generation code for Kanai, M. et al. (2022) is available here. Fine-mapping pipeline is available here.
- Python 3.7 or later
- Hail v0.2
- numpy
- scipy
- pandas
To run our WDL pipeline on Google Cloud, you additionally need:
- Cromwell
- Active Google Cloud project
- Note: A part of reference files are located in a public requester-pays bucket (
gs://finucane-requester-pays
)
- Note: A part of reference files are located in a public requester-pays bucket (
To run SLALOM locally, you need:
The following command would be the easiest way of installation.
curl -sSL https://broad.io/install-gcs-connector | python3 - --gcs-requester-pays-project YOUR_PROJECT_ID
Please modify wdl/slalom_example.json
and submit with wdl/slalom.wdl
and wdl/slalom_sub.zip
.
Example files are available at ./example
which was created from the GBMI meta-analysis summary statistics for COPD available here.
PYSPARK_SUBMIT_ARGS="--conf spark.driver.memory=1g pyspark-shell" \
python3 slalom.py \
--snp example/example.snp \
--out example/example.slalom.txt \
--out-summary example/example.summary.txt \
--annotate-consequence \
--annotate-cups \
--annotate-gnomad-freq \
--export-r \
--lead-variant-choice "prob" \
--weighted-average-r afr=n_afr amr=n_amr eas=n_eas fin=n_fin nfe=n_nfe \
--dentist-s \
--abf \
--summary \
--case-control \
--reference-genome GRCh38
Required minimum columns are as follows:
chromosome
: chromosome either in GRCh37 (1, 2, 3...) or in GRCh38 (chr1, chr2, chr3, ...). Users can specify a reference genome by--reference-genome GRCh38
.position
: positionallele1
: reference allele in a specified reference genome. If users are unsure about reference/alternative alleles, set--align-alleles
to make it consistent with gnomAD.allele2
: alternative allele in a specified reference genome. This allele is assumed to be an effect allele regardless of--align-alleles
.beta
: effect sizese
: standard errorp
: P-value
Other input column specifications are as follows:
- If
--weighted-average-r
is specified, sample size columns supplied by this argument are also required, such asn_afr
,n_eas
,n_nfe
, ... - If a total sample size
n_samples
(andn_cases
for--case-control
) exist in the input, additional output columnsmin_neff_r2
andmax_neff_r2
will be added. - Any other input columns will remain in an output except for those overwritten by SLALOM.
To make SLALOM-compatible per-locus .snp files from a genome-wide summary statistics, you can also use make_finemap_inputs.py from our fine-mapping pipeline.
To use our WDL pipeline, please modify wdl/slalom_example.json
. Specifications for the following options are as follows:
slalom.sumstats_pattern
: Path pattern for a summary statistics where{PHENO}
will be repalced by a phenotype name. E.g.,gs://YOUR_BUCKET/{PHENO}.sumstats.txt.gz
.slalom.phenolistfile
: Path to a plain text file without header. The first column corresponds to a phenotype name ({PHENO}
above). The second column corresponds to an argument for ``--weighted-average-r`.
Kanai, M. et al. Meta-analysis fine-mapping is often miscalibrated at single-variant resolution. Cell Genomics 2, 100210 (2022)
Masahiro Kanai ([email protected])