Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Model Fit #62

Open
mmilevskiy opened this issue Jun 20, 2017 · 5 comments
Open

Incorrect Model Fit #62

mmilevskiy opened this issue Jun 20, 2017 · 5 comments

Comments

@mmilevskiy
Copy link

Hi @AliciaSchep

I am having difficulties with the model fit for NucleoATAC. The model is fitting to all my reads, which includes FR and RF reads. The FR read insert length patterns match closely with my ATAC Tn5 banding patterns and expected output from this technique whilst the RF reads do not. I am unsure on how to exclude these, as a result the model may be over-representing the NFR regions and under-representing the nucleosome reads.

Can you comment on how the model affects the output of NucleoATAC? If I use my own insert length file, based on the FR reads only, this finds fewer NFR regions and nucleosome positions.

@AliciaSchep
Copy link
Contributor

Can you include some plots of the fragment size to show what you mean?

@mmilevskiy
Copy link
Author

mmilevskiy commented Jun 22, 2017

Hi @AliciaSchep. I have attached the insert length histogram from Picard as well as the model for the NucleoATAC using the default setting under "nucleoatac run'' and when I use my own insert length text file based on the FR reads only.

ATAC_FACS_Pool_Merged_insert_size_histogram.pdf

nucleoATAC_Default.occ_fit.pdf
nucleoATAC_FR_Read_Ins_Leng.occ_fit.pdf

Could the problem be that when I use my own insert length text file the model created based off that doesn't match the sequencing. So if I could remove the RF reads that might solve the problem.

@mmilevskiy
Copy link
Author

The RF reads, could they be due to smaller read lengths <75bp (75bp PE sequencing, NEXTSeq) and these reads overlapping and being mapped in the wrong orientation, but are reads to be included in the analysis? If so, the model from NucleoATAC would be correct?

@AliciaSchep
Copy link
Contributor

I would recommend re-examining how the adapter trimming & alignment is being performed, because there should be no reason for the RF reads to have different distribution than FR, and the sharp cutoff suggests that there is some kind of issue with the alignment that RF framgents longer than the read length are not getting mapped. There also seems to be a general dip of fragments around the read length for the FR reads, which also suggests that the adapter trimming isn't being fully effective at catching cases when only a couple of basepairs of adapter are included.

In general, the fragment length distribution from NucleoATAC might be a little bit different than what you get from Picard because only fragments within the peaks are queried.

@mmilevskiy
Copy link
Author

Hi Alicia. Thanks, I have been using trim_galore to do the adapter trimming. Would you recommend using the "-nextera" option or the "-a (sequence adapter)." Unless you have another trimming tool you find works better?

I am getting 90-95% mapping for my libraries with Bowtie2 on default settings, except "-X 2000."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants