-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline failing with older fastq file #216
Comments
Hi @RuanSpies21 , Happy to work on this together, could you please share 5 sample IDs from your dataset? This way I can test those locally. |
Thanks @abhi18av! ERR038276 |
Hi @abhi18av - any thoughts on this yet? |
Hi @RuanSpies21 , Apologies for the late response on this one, I has been able to reproduce this error on my side using the pipeline's default This was NOT resolved even when I enabled The following statistics were generated for the individual files |SAMPLE |AVG_INSERT_SIZE|MAPPED_PERCENTAGE|RAW_TOTAL_SEQS|AVERAGE_BASE_QUALITY|MEAN_COVERAGE|SD_COVERAGE|MEDIAN_COVERAGE|MAD_COVERAGE|PCT_EXC_ADAPTER|PCT_EXC_MAPQ|PCT_EXC_DUPE|PCT_EXC_UNPAIRED|PCT_EXC_BASEQ|PCT_EXC_OVERLAP|PCT_EXC_CAPPED|PCT_EXC_TOTAL|PCT_1X |PCT_5X |PCT_10X |PCT_30X |PCT_50X |PCT_100X|MAPPED_NTM_FRACTION_16S|MAPPED_NTM_FRACTION_16S_THRESHOLD_MET|COVERAGE_THRESHOLD_MET|BREADTH_OF_COVERAGE_THRESHOLD_MET|ALL_THRESHOLDS_MET|
|-------------------------|---------------|-----------------|--------------|--------------------|-------------|-----------|---------------|------------|---------------|------------|------------|----------------|-------------|---------------|--------------|-------------|--------|--------|--------|--------|--------|--------|-----------------------|-------------------------------------|----------------------|---------------------------------|------------------|
|MAGMA.ERX015472_ERR038276|366.5 |73.97 |17112670 |34.5 |154.672354 |71.416821 |157 |48 |0 |0.09915 |0.154413 |0 |0.02558 |0.001099 |0 |0.280241 |0.972886|0.96603 |0.961128|0.942092|0.916305|0.78371 |0.0 |1 |1 |1 |1 |
|MAGMA.ERX015473_ERR038277|384.5 |77.16 |18091946 |35.3 |176.428354 |71.317561 |185 |46 |0 |0.086099 |0.148593 |0 |0.020496 |0.000459 |0 |0.255647 |0.973516|0.966587|0.963094|0.950857|0.935023|0.859417|0.0 |1 |1 |1 |1 |
|MAGMA.ERX015474_ERR038278|407.9 |77.47 |13464688 |35.3 |134.847332 |58.306765 |142 |38 |0 |0.084936 |0.132313 |0 |0.020973 |0.000473 |0 |0.238694 |0.966827|0.959598|0.954376|0.933156|0.903887|0.750069|0.0 |1 |1 |1 |1 |
|MAGMA.ERX015475_ERR038279|427.5 |76.3 |16200744 |35.2 |155.460953 |61.334029 |165 |38 |0 |0.09051 |0.147023 |0 |0.021057 |0.000692 |0 |0.259282 |0.97162 |0.964273|0.960278|0.946518|0.928075|0.832158|0.0 |1 |1 |1 |1 |
|MAGMA.ERX015476_ERR038280|478.8 |75.39 |18525588 |35.2 |171.901534 |69.736791 |180 |42 |0 |0.096019 |0.158024 |0 |0.020307 |0.000607 |0 |0.274956 |0.973743|0.967281|0.96324 |0.949922|0.934053|0.859584|0.0 |1 |1 |1 |1 |
And I was able to reproduce the issue related to type casting in python script
NOTEI am currently working on a patch to address this issue - thank you for bringing it to my attention! |
@RuanSpies21 , could you please try running the pipeline with the following command? I have pushed a patch to master branch now. NOTE: Please replace whatever makes sense in your context, but the main snippet is nextflow run 'https://github.com/TORCH-Consortium/MAGMA'
-profile singularity,bwa_k66
-r master
-latest
-resume
-params-file params.magma.yaml |
Thank you so much for the help @abhi18av! I'm so sorry, I am not quite getting it right :( When I run If I then add the -c custom.config with the file mentioned above I get Seems to be an issue with sample sheet validation? Here is the format of my sample sheet for reference:
I've also attached the nextflow logs in case helpful. Thanks again for your help - very sorry to keep bothering! |
Hi @RuanSpies21 test profileThe samplesheet sheet looks fine to me, but let's make sure that the basics are all set
This should make use of the
|
Ok its looks like its failing with the same error on the test profile as well. I ran Output:
|
Then, I think the problem might be with you Java setup, could you please confirm you're using an LTS version as mentioned here https://github.com/TORCH-Consortium/MAGMA?tab=readme-ov-file#nextflow ? |
I can confirm I'm using a LTS version of Java 17. I don't seem to get the same error when using the alpha pre-release of v2.0.0 In this case the pipeline runs successfully through the samplesheet validation step |
Mmm, then the next suspect is the version of Nextflow, which I think should fix the problem Could you please test with the following command? 🙏 NXF_VER=24.04.4 nextflow run 'https://github.com/TORCH-Consortium/MAGMA' -profile docker,server,test -r hotfix/bwa_k66 If this works, then I will set the minimum nextflow version to |
Ok great! Test seems to have worked. Thanks for the help. Will give it a bash with these old sequences now - holding thumbs, will let you know how it goes. |
Just getting loads of fails for VALIDATE_FASTQS_WF:FASTQ_VALIDATOR |
Good so we're past the setup issues.
I wouldn't worry too much about the samples from
So it seems that these samples are likely to be either corrupted while downloading or moving across external disks/computers.
One file which you might want to inspect is the |
I'd recommend you download your samples from NCBI/ENA using |
Thanks for this @abhi18av. Its a long journey we have been on together now 😂. It seems the pipeline really does not like these old files. I re-downloaded some of them with Further, those that do pass have 0 coverage. As a sanity check, a batch of newer fastqs processed successfully so set up is fine. |
Hi @RuanSpies21
Actually, I would need more evidence to believe that - since we've been using MAGMA to analyse all Brazilian and South African sequences from SRA, produced in last 20 years, and unless there's something wrong with the samples themselves they get through. That is the reason, why we added the JSON file so that we can have a better overview of the samples which failed. Could you please share that JSON
Indeed, the results here are very suspicious, I will try to run these samples on my end to see if they are atleast reproducible
|
Ah ok I see. Here is the |
Hi @RuanSpies21 , just letting you know that I'm still tracking this, just running across some resource contraints these days on our shared server. |
No worries @abhi18av! Thank you so much - have already been so accommodating |
Hi there,
I am trying to run the pipeline on some older fastq files (circa 2010s) using the docker profile. The reads for the files are relatively short at ~75bp. Following previous advice from Abhinav, I have created a custom.config file with contents:
which I specify with the -c argument. So my full command is:
nextflow run . -params-file params/params.yaml -profile docker,server,bwa_k66 -c custom.config
.However I get this following error:
I think these is due to the sample returning with 0 coverage (when i check /mnt/volume_data/ruan/walker_2013/MAGMA/magma-results/QC_statistics/per_sample/coverage all have 0)
Any ideas what could be going on here or any workarounds?
ERR038264 is an example fastq
Thanks!
Ruan
The text was updated successfully, but these errors were encountered: