Skip to content

dozmorovlab/CTCF.dev

Repository files navigation

Scripts for the CTCF project

See https://dozmorovlab.github.io/CTCF/ and https://github.com/dozmorovlab/CTCF/ for more information

  • 01_Logos.Rmd - Logo clustering.

    • Input: CTCF.dev/motif_databases/*.meme motif files
    • Output: Figure_clustered_main_PWMs.svg
    • Input: CTCF.dev/Homo_sapiens_2022_05_20_3_16_pm/pwms_CTCF_motifs/CIS-BP_2.00_Homo_sapiens.meme, created from combined .txt matrices
    • Output: Figure_CIS-BP_2.00_Homo_sapiens.svg
    • Input: CTCF.dev/Mus_musculus_2022_05_20_4_01_pm/pwms_CTCF_motifs/CIS-BP_2.00_Mus_musculus.meme, created from combined .txt matrices
    • Output: Figure_CIS-BP_2.00_Mus_musculus.svg
    • Input: CTCF.dev/CTCFBSDB_PWM_corrected.meme
    • Output: Figure_clustered_CTCFBSDB_PWMs.svg
  • 02_EDA_SCREEN.Rmd - Download and Process https://screen.encodeproject.org/. Basic stats, conversion to GRanges

    • Input: GRCh38-CTCF.bed and mm10-CTCF.bed
    • Output: hg38.SCREEN.GRCh38_CTCF, mm10.SCREEN.mm10_CTCF granges objects and BED files
  • 03_EDA_CTCFBSDB.Rmd - Download and Process CTCFBSDB, predicted data. LiftOver hg18-hg19-hg38, mm8-mm9-mm10. Experimental data not used

    • Input: allcomp.txt.gz, predicted data
    • Output: hg18.CTCFBSDB.CTCF_predicted_human, mm8.CTCFBSDB.CTCF_predicted_mouse granges objects and BED files
  • 04_FIMO_processing.Rmd - Processing FIMO chromosome-specific results processed on an HPC cluster. See scripts for more details. File name conventions: <assembly>.<Database>.<original database name or label>

    • Input: fimo.txt.gz files from genome-, database-, and chromosome-specific subfolders
    • Output: <assembly>.<Database> granges objects and BED file. log_PWMs.csv - count statistics: "Assembly", "All (p-value threshold 1e-4)", "Reduced (p-value threshold 1e-4)", "All (p-value threshold 1e-6)", "Reduced (p-value threshold 1e-6)"
  • 05_EDA_liftOver.Rmd - overlap between originally aligned and lifted-over genomes

    • Input: BED files from CTCF.dev/CTCF_liftover. liftOver chains obtained using download.sh. Processed with convert.sh that also outputs counts of mapped and unmapped regions to log_liftOver.csv
    • Output: Figure_liftOverJaccard.svg
  • 05_EDA_liftOver_mm.Rmd - same for mm9-mm10-mm39

  • 06_FIMO_EDA.Rmd - exploratory analysis of p-value distributions for human and mouse genomes

    • Input: hg38 and mm10 FIMO-detected sites
    • Output: density plots of p-value distributions, Figure_human_pvalues.svg, Figure_mouse_pvalues.svg
  • 06_CTCF_Threshold.Rmd - Exploring MEME p-value threshold cutoff

    • Input: GRCh38-CTCF.bed ENCODE SCREEN CTCF cCREs as gold standard, hg38.MA0139.1.bed MEME CTCF sites
    • Output: Figure_human_pvalues_threshold.svg
  • 06_CTCF_Threshold_mm.Rmd - same for mm9-mm10-mm39

  • BED_to_BEDPE.Rmd - Convert BED to paired BEDPE format

    • Input: PreciseTAD-predicted regions, Avocado_preciseTAD/Maggie/GM12878/PTBR_Peakachu_outputs/
    • Output: BEDPE files in the same folder
  • EDA_Chang_Noordermeer_2021.Rmd - Processing Chang_Noordermeer_2021.xlsx

  • EDA_PWMScan.Rmd - PWMScan analysis

  • EDA_AnnotationHub.Rmd - explore CTCF data on AnnotationHub and ExperimentHub

data

  • hg38.PWMScan.JASPAR_CORE_2020_vertebrates.CTCF_MA01391 - 148946, 19bp

  • hg38.PWMScan.Jomla2013_Human_and_Mouse_HT_SELEX.CTCF_C2H2_full_monomeric - 47794, 17bp

  • hg38.PWMScan.Jomla2013_Human_and_Mouse_Complete_Set.CTCF_full - 47792, 17bp

  • hg38.PWMScan.HOCOMOCO_v11_Human_TF_Collection.CTCF_HUMAN_H11MO0A - 159522, 19bp

  • hg38.PWMScan.Isakova2017_SMILWseq_Human_TF_Binding.CTCF - 79761, 15bp

  • hg38.PWMScan.SwissRegulon_Human_and_Mouse.CTCF_p2 - 163274, 20bp

  • hg38.PWMScan.CIS_BP.CTCF_M4427_102 - 164278, 21bp

  • mm10.PWMScan.JASPAR_CORE_2020_vertebrates.CTCF_MA01391 - 202655, 19bp

  • mm10.PWMScan.Jomla2013_Human_and_Mouse_HT_SELEX.CTCF_C2H2_full_monomeric - 103051, 17bp

  • mm10.PWMScan.Jomla2013_Human_and_Mouse_Complete_Set.CTCF_full - 103048, 17bp

  • mm10.PWMScan.HOCOMOCO_v11_Mouse_TF_Collection.CTCF_MOUSE_H11MO0A - 193410, 20bp

  • mm10.PWMScan.Isakova2017_SMILWseq_Mouse_TF_Binding.CTCF - 149581, 17bp

  • mm10.PWMScan.SwissRegulon_Human_and_Mouse.CTCF_p2 - 235028, 20bp

  • mm10.PWMScan.CIS_BP.CTCF_M6125_102 - 247201, 15bp

T2T = GCA_009914755.4

Releases

No releases published

Packages

No packages published

Languages