Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low Mapping efficiency #713

Open
Marh32 opened this issue Nov 7, 2024 · 9 comments
Open

Low Mapping efficiency #713

Marh32 opened this issue Nov 7, 2024 · 9 comments

Comments

@Marh32
Copy link

Marh32 commented Nov 7, 2024

Hi, Felix.

I'm so sorry to bother you again. I have WGBS data from the same batch for two species (Accel Swift data) and have already trimmed it using the method you suggested, with the following command: trim_galore -j 5 -q 20 --phred33 --fastqc --max_n 3 --stringency 3 --length 36 --paired --clip_r1 10 --clip_r2 15 --three_prime_clip_r1 10 --three_prime_clip_r2 10. However, I noticed that one species has a high mapping efficiency (~70%) with a chromosome-level reference genome, while the other species has only ~20% mapping efficiency, with a reference genome that includes over 30,000 scaffolds. In this situation, what steps can I take to improve the mapping efficiency? should I change the --score_min? Thanks in advance.

@FelixKrueger
Copy link
Owner

It shouldn't really matter for the alignment step whether you have scaffolds or a more polished, long chromosomes. But yes, I relaxing the mapping parameters will quickly tell you if the scaffold sequence isn't quite perfect. L,0,-0.4 or L,0,-0.6 could be a good start.

@Marh32
Copy link
Author

Marh32 commented Nov 7, 2024

Thank you very much for your quick response. I’ll try your suggestions right away

@Marh32
Copy link
Author

Marh32 commented Nov 7, 2024

I changed the --score_min to L,0,-0.6, then the mapping efficiency has increased from 21.9 to 28.9. But it's still very low, what else can do to improve the situation? Thank you very much.

@FelixKrueger
Copy link
Owner

Possibly the best you can do would be to run it in --local mode, that will give you an idea of the upper limit of what amount of reads in the library are from your organism in question. Also, you should take a look at the rate of ambiguous alignments: if the genome is very repetitive, you may struggle getting a very high mapping efficiency for unique reads.

@Marh32
Copy link
Author

Marh32 commented Nov 7, 2024

Thank you very much. I just checked the report, and most of the reads did not align. Maybe I have to try run it with --local.By the way, I'm curious whether the --local model will significantly increases the error rate? Since I’m not very clear about the consequences of using local mode, I wanted to ask if it would be possible to proceed to the next step of analysis using just this small portion of mapped data? Thank you very much
Screenshot 2024-11-07 at 16 45 57

@FelixKrueger
Copy link
Owner

That's a good question... For Accel Swift data it is still important to trim the data first (especially Read 2) as the reads may map, but all the high G content at the start of Read 2 will be (incorrectly) called methylated...

@Marh32
Copy link
Author

Marh32 commented Nov 7, 2024

So maybe using just this small portion of mapped data for downstream analysis would be a good option, right?

@FelixKrueger
Copy link
Owner

I would probably argue that this is the portion that was treated fairly conservatively, and you know what you are dealing with. Local alignments may result in an increased alignment rate, but you are not really in full control of what the aligner does... We wrote some thoughts around this here: https://sequencing.qcfail.com/articles/soft-clipping-of-reads-may-add-potentially-unwanted-alignments-to-repetitive-regions/

@Marh32
Copy link
Author

Marh32 commented Nov 7, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants