See https://dozmorovlab.github.io/CTCF/ and https://github.com/dozmorovlab/CTCF/ for more information
-
01_Logos.Rmd
- Logo clustering.- Input:
CTCF.dev/motif_databases/*.meme
motif files - Output:
Figure_clustered_main_PWMs.svg
- Input:
CTCF.dev/Homo_sapiens_2022_05_20_3_16_pm/pwms_CTCF_motifs/CIS-BP_2.00_Homo_sapiens.meme
, created from combined .txt matrices - Output:
Figure_CIS-BP_2.00_Homo_sapiens.svg
- Input:
CTCF.dev/Mus_musculus_2022_05_20_4_01_pm/pwms_CTCF_motifs/CIS-BP_2.00_Mus_musculus.meme
, created from combined .txt matrices - Output:
Figure_CIS-BP_2.00_Mus_musculus.svg
- Input:
CTCF.dev/CTCFBSDB_PWM_corrected.meme
- Output:
Figure_clustered_CTCFBSDB_PWMs.svg
- Input:
-
02_EDA_SCREEN.Rmd
- Download and Process https://screen.encodeproject.org/. Basic stats, conversion to GRanges- Input:
GRCh38-CTCF.bed
andmm10-CTCF.bed
- Output:
hg38.SCREEN.GRCh38_CTCF
,mm10.SCREEN.mm10_CTCF
granges objects and BED files
- Input:
-
03_EDA_CTCFBSDB.Rmd
- Download and Process CTCFBSDB, predicted data. LiftOver hg18-hg19-hg38, mm8-mm9-mm10. Experimental data not used- Input:
allcomp.txt.gz
, predicted data - Output:
hg18.CTCFBSDB.CTCF_predicted_human
,mm8.CTCFBSDB.CTCF_predicted_mouse
granges objects and BED files
- Input:
-
04_FIMO_processing.Rmd
- Processing FIMO chromosome-specific results processed on an HPC cluster. See scripts for more details. File name conventions:<assembly>.<Database>.<original database name or label>
- Input:
fimo.txt.gz
files from genome-, database-, and chromosome-specific subfolders - Output:
<assembly>.<Database>
granges objects and BED file.log_PWMs.csv
- count statistics: "Assembly", "All (p-value threshold 1e-4)", "Reduced (p-value threshold 1e-4)", "All (p-value threshold 1e-6)", "Reduced (p-value threshold 1e-6)"
- Input:
-
05_EDA_liftOver.Rmd
- overlap between originally aligned and lifted-over genomes- Input: BED files from
CTCF.dev/CTCF_liftover
. liftOver chains obtained usingdownload.sh
. Processed withconvert.sh
that also outputs counts of mapped and unmapped regions tolog_liftOver.csv
- Output:
Figure_liftOverJaccard.svg
- Input: BED files from
-
05_EDA_liftOver_mm.Rmd
- same for mm9-mm10-mm39 -
06_FIMO_EDA.Rmd
- exploratory analysis of p-value distributions for human and mouse genomes- Input: hg38 and mm10 FIMO-detected sites
- Output: density plots of p-value distributions,
Figure_human_pvalues.svg
,Figure_mouse_pvalues.svg
-
06_CTCF_Threshold.Rmd
- Exploring MEME p-value threshold cutoff- Input:
GRCh38-CTCF.bed
ENCODE SCREEN CTCF cCREs as gold standard,hg38.MA0139.1.bed
MEME CTCF sites - Output:
Figure_human_pvalues_threshold.svg
- Input:
-
06_CTCF_Threshold_mm.Rmd
- same for mm9-mm10-mm39 -
BED_to_BEDPE.Rmd
- Convert BED to paired BEDPE format- Input: PreciseTAD-predicted regions,
Avocado_preciseTAD/Maggie/GM12878/PTBR_Peakachu_outputs/
- Output: BEDPE files in the same folder
- Input: PreciseTAD-predicted regions,
-
EDA_Chang_Noordermeer_2021.Rmd
- ProcessingChang_Noordermeer_2021.xlsx
-
EDA_PWMScan.Rmd
- PWMScan analysis -
EDA_AnnotationHub.Rmd
- explore CTCF data on AnnotationHub and ExperimentHub
-
See scripts/download_PWMs.sh for data download instructions
-
UCSC_CTCF.tsv
- manually created list of hg38 CTCF experiments, from http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=1466892273_0EgVDbIuSXB31dnORsHXNKaH6gLy&c=chrX&g=encTfChipPk . Used in07_CTCF_Threshold.Rmd
-
PWMs
- PWMs used in the package. README there
-
hg38.PWMScan.JASPAR_CORE_2020_vertebrates.CTCF_MA01391 - 148946, 19bp
-
hg38.PWMScan.Jomla2013_Human_and_Mouse_HT_SELEX.CTCF_C2H2_full_monomeric - 47794, 17bp
-
hg38.PWMScan.Jomla2013_Human_and_Mouse_Complete_Set.CTCF_full - 47792, 17bp
-
hg38.PWMScan.HOCOMOCO_v11_Human_TF_Collection.CTCF_HUMAN_H11MO0A - 159522, 19bp
-
hg38.PWMScan.Isakova2017_SMILWseq_Human_TF_Binding.CTCF - 79761, 15bp
-
hg38.PWMScan.SwissRegulon_Human_and_Mouse.CTCF_p2 - 163274, 20bp
-
hg38.PWMScan.CIS_BP.CTCF_M4427_102 - 164278, 21bp
-
mm10.PWMScan.JASPAR_CORE_2020_vertebrates.CTCF_MA01391 - 202655, 19bp
-
mm10.PWMScan.Jomla2013_Human_and_Mouse_HT_SELEX.CTCF_C2H2_full_monomeric - 103051, 17bp
-
mm10.PWMScan.Jomla2013_Human_and_Mouse_Complete_Set.CTCF_full - 103048, 17bp
-
mm10.PWMScan.HOCOMOCO_v11_Mouse_TF_Collection.CTCF_MOUSE_H11MO0A - 193410, 20bp
-
mm10.PWMScan.Isakova2017_SMILWseq_Mouse_TF_Binding.CTCF - 149581, 17bp
-
mm10.PWMScan.SwissRegulon_Human_and_Mouse.CTCF_p2 - 235028, 20bp
-
mm10.PWMScan.CIS_BP.CTCF_M6125_102 - 247201, 15bp
T2T = GCA_009914755.4