Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hic interactions statistics #78

Open
AlcaArctica opened this issue Dec 15, 2023 · 3 comments
Open

hic interactions statistics #78

AlcaArctica opened this issue Dec 15, 2023 · 3 comments

Comments

@AlcaArctica
Copy link

AlcaArctica commented Dec 15, 2023

I am running the arima pipeline, followed by yahs and then juicer pre (as described in this repository) to generate the required out_JBAT.hic file for manual curation in juicebox. This is my first try, but I am happy with the resulting map and will probably implement this workflow in the future again.
However, I am wondering how I could generate statistics about the quality of the hic interactions? It seems that people who use juicer to create their .hic file get some stats file along with their other results with information similar to this:

Inter-chromosomal: 1,320,146 (0.51% / 0.93%)
Intra-chromosomal: 7,458,303 (2.87% / 5.27%)
Short Range (<20Kb): 4,571,216 (1.76% / 3.23%)
Long Range (>20Kb): 2,886,831 (1.11% / 2.04%)

How can I obtain a similar statistic for my data with the described workflow (arima - yahs - juicer pre)?
Thank you very much

@AlcaArctica
Copy link
Author

Also appreciate if you can point me to any other tools suitable for assessing the quality of my hic interactions. I am using the arima 4 enyzme kit, if that is relevant.

@c-zhou
Copy link
Owner

c-zhou commented Dec 18, 2023

Hello @AlcaArctica,

YaHS does give you Inter-chromosomal and Intra-chromosomal read pair counts during running if you check your log file. However, that is for contigs, i.e., before scaffolding. So it is more like Inter-contig and Intra-contig.

If you need accurate numbers for these statistics, I would suggest remapping the hic data to your final chromosomes and using tools such as samtools to do the counting. The 9th column of the SAM file is what you need, i.e., the TLEN field - observed Template LENgth. See section 1.4 of [this document] (https://samtools.github.io/hts-specs/SAMv1.pdf).

Best,
Chenxi

@AlcaArctica
Copy link
Author

Thank you, @c-zhou . I will investigate further!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants