-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output of demultiplexing mixed oriented reads #764
Comments
To add additional context and an additional question: I'm using Cutadapt 4.5 and python 3.10 with the following script to demultiplex the files that I have:
Through this I have a low number of reads assigned to different output files and an high number of reads in the unknown file produced following the instruction of the User guide. So, doing the second steps in this order:
gave me another bunch of files as output with a reasonable number of reads for file,but still having 70000 reads on R1 and 92000 reads on R2. I was trying to understand, as we face a lot of problem recently in our group, what was going on...so for curiosity I tried to use This gave me as output a good distribution of files with just 20 and 24 reads in the respective unknown R1 and R2. Now, my question is...this solution could be fine? is still producing mixed oriented fastq files in the output? |
Yes, I forgot this in the guide: If you want one file per barcode, you need to concatenate the files, and that is indeed most easily done with That said, you may want to instead update to Cutadapt 4.6 and use the I just updated the documentation. If you try it, can you please let me know whether it works as described (I haven’t had time to test it myself)?
Assuming these are Illumina sequences and your barcode is not much longer than 10 bases, allowing one error is probably good enough. An error rate of 0.15 probably gets you there, but note that recent Cutadapt versions allow you to specify the allowed number of errors directly as in
What percentage of the total is that and how many barcodes do you have?
This is the expected outcome, but not the result you want because this comes from matches caused by chance: Using
You could start by trying to verify that the barcode is indeed where it is supposed to be. You can use Cutadapt for that. For example, use |
This function is really worthing the update! Thank you for such a development, and yes, I have the exact same number that I obtained going through the separated steps...anyway I was not going in deep analysis, just some reads count after that, but for now seems working. Do you think should be useful also in removing the primer in paired ends mixed oriented reads?
I have 96 barcode, and the total amount of starting reads are 1096025 (double checked with fastqc).
This is completely true, as I tried to get a fastqc from the -b output and the distribution, kmer and other stats were completely bad.
In addition, I tried with this last setting that you suggested but the only output that I got is this: `Processing paired-end reads on 30 cores ... === Summary === Total read pairs processed: 1,096,025 == Read fate breakdown == Total basepairs processed: 648,012,492 bp From the reads distribution, seems that is better divided than before...but is still unknown how much this should be the right processing or not...do you have any idea how cutadapt is not producing any type of output? I just added |
Just a simple check before looking into this any further: Did you enclose the text in quotation marks? Otherwise, the shell will interpret the semicolon as a separator:
So it needs to be something like |
Yes, the problem was this related to that command, but... the problem is that using
well, in that case, I got that some reads were really too short, so for having a more complete statistic, in absence of the detailed one for each barcode, I added The first feedback, is related to this...no the usual output using While...I was unaware that not all the barcode were corresponding to samples, but now that I have a more clear view, I know that up to 48 was actually samples, after that no. So, checking with fastqc, I got that for the barcode after 49 all the fastq files had just 1 read for each. Finally, following this data, the output is quite nice (even if there are still 11,4% of unassigned) and for the other barcode (no corresponding to samples) I think that 1 reads is quite enough to say that that option combined are a good setting. Probably, relaxing a little bit the |
Hi there!
I'm following the guidelines for the mixed oriented reads case, trying to demultiplex a bunch of 4 different files (x_R1.fastq.gz, x_R2.fastq.gz, y_R1.fastq.gz and y_R2.fastq.gz).
After running the pipeline until round 2, then reversing the files and processing everything again...do I have to merge the resulting files via the cat command? What is the procedure you would recommend? error rate suggestion? I'm using actually a pipeline which has an error set to 0.15.
The text was updated successfully, but these errors were encountered: