-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support end-to-end alignments? #399
Comments
The In your example, the query is longer than the reference, so there’s necessarily soft clipping. How do you want the alignment to look? |
Thank you for the reply. I want to drop these reads in the alignment output, since they are mapping errors and would significantly overestimate the mapping rate. I can filter the AS tag manually, but it seems that |
Would you filter out negative AS values? I think you can do the equivalent here: Use |
will there be some side effects? If AS is the large, how can we ensure the resolution of -A and -B arguments. |
Is it possible to do end to end mapping using strobealign https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#end-to-end-alignment-versus-local-alignment |
It should be fine. Since scores are only ever added up, the only consequence is that AS values are shifted and that soft-clipping is avoided if possible. One limitation of soft clipping in strobealign is that the alignment library we use (StripedSmithWaterman) does not directly support an end bonus/soft clipping penalty, so we compute the alignment without end bonus/soft clipping penalty and then attempt to extend the alignment to the ends of the query to see whether that gives a better overall score when taking the end bonus into account.
Probably not as long as we use StripedSmithWaterman. We have also considered switching to ksw2, which would suppor this better. That said, my impression (which may be wrong) is that forcing end-to-end alignment is not the right thing to do here. It seems you’re attempting to use it as a kind of filter that gives you a very low AS when there would otherwise be soft clipping. Can’t you just directly filter by AS? Or directly filter out anything with soft clipping? |
Thank you very much for the explanation. I was trying to filter out these reads using the AS tag, but it seems that soft clipping does not add penalty to this tag. But I will try to filter base one the alignment length over the query read length. |
@fplaza, You have shown some interest in end-to-end alignment as well. Can you say why you would want it (what is your use case)? We have some code that we never added where we switched to using WFA2 for the alignment (see #229 and #251). WFA2 supports end-to-end alignment, which we didn’t want at the time, but if there’s a compelling reason for allowing end-to-end alignments, we could add a command-line flag that switches to WFA2. This needs to be considered carefully, however, because it would add quite some code for a relatively small feature that may not be used that often. |
Hi @marcelm , We have a legacy pipeline that maps reads (shotgun metagenomic data) against a gene catalogue. I also use end-to-end alignments when I map reads from a sample against its own assembly because I expect that reads will be fully aligned. |
I also have a use case for end-to-end alignments. We are currently experimenting with different read mappers to align spacer sequences of CRISPR arrays to phage genomes to infer the bacterial hosts of phages (if we have a good match between the spacer and the phage genome, it is likely that that phage infected the bacteria that encoded the CRISPR array). The number of mismatches across the entire length of the spacer is used as a proxy for the host assignment reliability, so the end-to-end alignment is required. Right now we are using BLAST to perform those alignments (as described in the "Host assignment" section here), but it is not really scaling to the amount of data we are processing. For now we are relying on Bowtie 1 because it allows us to have end-to-end alignments and to control the maximum number of mismatches, but moving to an actively maintained tool would be great. Tagging @simroux, who is also interested in this. |
It seems that "penalty" for soft clip, which is specified by the
-L
argument is "reward" rather than "penalty". More specicailly, given the example below:reference file is
input fastq file is
Command
strobealign -t 16 -L 0 -k 10 ref.fa input.fq
output mapping result as follow:While command
strobealign -t 16 -L 100 -k 10 ref.fa input.fq
output mapping result as follow:However, since the polyT region in the reads, which is complement to polyA should not be aligned into the internal region of the reference sequence. It means that end-to-end mapping is required for this example, and penalty for soft clipping is necessary. But the
-L
argument increases the align score rather than decrease it. Is is possible to do real "soft clip pentalty" for strobealign?The text was updated successfully, but these errors were encountered: