-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEACR missing lots of obvious peaks #87
Comments
Hello, It's possible that the T2T assembly is going to cause there to be some differences in the global background estimation, since large pileups at difficult-to-map repeated regions might be differently represented now that there are presumably many more available reference sites at which ambiguous reads could map. My understanding is that the bowtie2 default (in contrast with -k or -a mode) is to assign the read semi-randomly to a single map location when there are multiple possible map locations with the same quality score (described in more detail here). Are you using bowtie2 here, and if so is it doing the default map position reporting? Does removing duplicates make a difference? This is something I will probably need to take a closer look at since it could be a generalized behavior with the T2T builds, which are unlike anything we've used before. Mike |
Hi Mike, Thanks for the quick reply and explanation. I did try removing duplicates from the IgG (duplication rate ~50%) but not H3K27me3 (duplication 13%) and then running SEACR with that. It doesn't seem to improve peak calling unfortunately. I did use Bowtie2 with the default setting. I'll have a look into -k and -a and see if that's appropriate for what I'm looking into. I've also considered using --non-deterministic so that identical reads don't get aligned to the same places if there are multiple equally good alignments. Thanks, |
Hi all,
I ran CUT&RUN experiments with 200.000 cells per experiment and H3K27me3 (CST #9733 antibody) being my positive control and IgG being my negative control.
These were all sequenced at 150x150bp with 10-20 million reads each.
I aligned the reads to the new CHM13-T2T genome with Bowtie2, checked the bam duplication rate and they were all <50% so I did not remove duplicates.
I generated bedgraphs from the bam files with bedtools genomecov (I did not normalise with my spike-in as it's not required for my set of experiments).
I then called peaks using SEACR with IgG as the background and in 'norm' mode.
Looking at the peaks called, it seems like SEACR fails to call peaks in many sites where there is obvious enrichment in H3K27me3.
Here's an IGV screenshot of the HOXC gene cluster.
Blue: H3K27me3 (top) and IgG (bottom) bedgraphs
Green: Peaks called with MACS2 using different p and q-value cut-offs (I used my bam files as input and -f BAMPE)
Red: Peaks called with SEACR on relaxed (top) and stringent (bottom) mode
What could possibly be causing this? I'm double checking all my scripts in case I've made a silly mistake.
Thanks for taking the time to read this.
Lillian
The text was updated successfully, but these errors were encountered: