-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keyError in get_cov_len_megahit #3
Comments
Hello @jamesabbott, Thanks for posting this issue. I'm currently testing MetaCoAG on MEGAHIT assemblies as they have a different format than metaSPAdes. Setting Can you share with me the format of the abundance file you are using? Thank you! Best regards, |
Hi Vijini,
I thought this approach was probably wrong! My abundance file looks like:
k127_21251076 9.4100
k127_17649206 1.4000
k127_1080564 3.8376
k127_5402820 5.3981
k127_9364888 1.8202
k127_7203760 0.5843
k127_10805640 3.5836
k127_1 13.0039
Having reread the documentation this could be my problem:
“Abundance file (in .tsv format) with a contig in a line and its coverage in each sample.”
Previous binning tools I’ve used have worked only on pooled reads so I mapped the full set of reads to the megahit assembly – is the intention here that I do this separately for the reads in each sample?
Many thanks
James
From: Vijini Mallawaarachchi ***@***.***>
Date: Friday, 24 September 2021 at 07:39
To: Vini2/MetaCoAG ***@***.***>
Cc: James Abbott (Staff) ***@***.***>, Mention ***@***.***>
Subject: Re: [Vini2/MetaCoAG] keyError in get_cov_len_megahit (#3)
CAUTION: This email originated from outside the University of Dundee. Do not click links or open attachments unless you recognise the sender's email address and know the content is safe.
Hello @jamesabbott<https://github.com/jamesabbott>,
Thanks for posting this issue. I'm currently testing MetaCoAG on MEGAHIT assemblies as they have a different format than metaSPAdes.
Setting n_samples=len(coverages) would not be correct as this will set the number of samples to the length of the coverages dictionary.
Can you share with me the format of the abundance file you are using?
Thank you!
Best regards,
Vijini
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIRXSMMF5IA62YVA5EPGXDUDQMJXANCNFSM5ETMIFJA>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
The University of Dundee is a registered Scottish Charity, No: SC015096
|
Hello @jamesabbott, Thanks for sharing the format of your abundance file. MetaCoAG is designed to support coverages from both pooled reads as well as reads from individual samples. So it should not be an issue. I think the issue is with the format of the abundance file you are using. Can you please check if there is a space or a tab between the contig ID and the coverage value in each line. MetaCoAG expects a .tsv (tab separated) file, so the values should be separated by a tab, not by a space. If possible, can you also attach the abundance file here? Thank you! |
Hi Vijini,
I’ve just attached the head of the file since the total size is >400Mb. As far as I can see this is correctly tab-delimited:
M-000582:~ jabbott $ cat -vet coverage_top.txt
k127_21251076^I9.4100$
k127_17649206^I1.4000$
k127_1080564^I3.8376$
k127_5402820^I5.3981$
k127_9364888^I1.8202$
k127_7203760^I0.5843$
k127_10805640^I3.5836$
k127_1^I13.0039$
k127_21611263^I1.9070$
k127_15127896^I3.6311$
Best Regards
James
From: Vijini Mallawaarachchi ***@***.***>
Date: Friday, 24 September 2021 at 09:08
To: Vini2/MetaCoAG ***@***.***>
Cc: James Abbott (Staff) ***@***.***>, Mention ***@***.***>
Subject: Re: [Vini2/MetaCoAG] keyError in get_cov_len_megahit (#3)
CAUTION: This email originated from outside the University of Dundee. Do not click links or open attachments unless you recognise the sender's email address and know the content is safe.
Hello @jamesabbott<https://github.com/jamesabbott>,
Thanks for sharing the format of your abundance file.
MetaCoAG is designed to support coverages from both pooled assemblies as well as coverages from individual samples as well. So it should not be an issue.
I think the issue is with the format of the abundance file you are using. Can you please check if there is a space or a tab between the contig ID and the coverage value in each line. MetaCoAG expects a .tsv (tab separated) file, so the values should be separated by a tab, not by a space. If possible, can you also attach the abundance file here?
Thank you!
Vijini
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIRXSKBZA6EKASLTTJYYO3UDQWZPANCNFSM5ETMIFJA>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
The University of Dundee is a registered Scottish Charity, No: SC015096
k127_21251076 9.4100
k127_17649206 1.4000
k127_1080564 3.8376
k127_5402820 5.3981
k127_9364888 1.8202
k127_7203760 0.5843
k127_10805640 3.5836
k127_1 13.0039
k127_21611263 1.9070
k127_15127896 3.6311
|
Hello @jamesabbott, Can I know whether you have used the original .fastg file output by MEGAHIT as the assembly graph file? Currently, MetaCoAG supports only .gfa files for the assembly graph file. I'm currently working on adding support for .fastg files as well. I have updated the documentation and if this is the case, I'm so sorry for the confusion. Also, can you share with me the first few lines of the assembly graph file as well? Thank you! |
Hi VIjini,
No problem – the top of the assembly graph is attached. I converted the megahit fastg file to gfa using Heng Li’s
gfa1: https://github.com/lh3/gfa1.
I realise now that there are two different versions of gfa. Should this be gfa2 format?
Many thanks
James
From: Vijini Mallawaarachchi ***@***.***>
Date: Saturday, 25 September 2021 at 00:57
To: Vini2/MetaCoAG ***@***.***>
Cc: James Abbott (Staff) ***@***.***>, Mention ***@***.***>
Subject: Re: [Vini2/MetaCoAG] keyError in get_cov_len_megahit (#3)
CAUTION: This email originated from outside the University of Dundee. Do not click links or open attachments unless you recognise the sender's email address and know the content is safe.
Hello @jamesabbott<https://github.com/jamesabbott>,
Can I know whether you have used the original .fastg file output by MEGAHIT as the assembly graph file?
Currently, MetaCoAG supports only .gfa files for the assembly graph file. I'm currently working on adding support for .fastg files as well.
Also, can you share with me the first few lines of the assembly graph file as well?
Thank you!
Vijini
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIRXSOJZ2BFQOUBK7V5SG3UDUF47ANCNFSM5ETMIFJA>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
The University of Dundee is a registered Scottish Charity, No: SC015096
|
Hello @jamesabbott, MetaCoAG currently supports assembly graphs in GFA1 format. I have also tested Heng Li’s fastg2gfa script and the assembly graph produced works fine with MetaCoAG. Please let me know how your run goes with the new GFA assembly graph. Let me know if you come across further issues. Thank you very much for using MetaCoAG and pointing out the issues! Best regards, |
Hi Vijini,
Th previously attached gfa file was created using fastg2gfa, so in theory it should be ok. I’ve just validated it with gfapy and it seems to be ok, and identified as gfa1. The process I’ve gone through to generate the coverage file and gfa file from the megahit output is as follows:
bbwrap.sh ref=megahit/final.contigs.fa in=1.fq.gz in2=2.fq.gz out=aln.sam.gz \
kfilter=22 subfilter=15 maxindel=80
pileup.sh in=aln.sam.gz out=cov.txt
#extract contig id and coverage from pileup output
cat cov.txt|grep -v '^#'|awk -F"\t" '{print $1"\t"$2}' | awk -F" " '{print $1"\t"$5}' > coverage.txt
megahit_toolkit contig2fastg 127 megahit/final_contigs/.fa > final_contigs.fastg
fastg2gfa final_contigs.fastg > final_contigs.gfa
The only other thing I wasn’t sure about was the choice of kmer-size to use when creating the fastg file from the megahit final_contigs.fa, since there are a range of kmers used to generate the final contig set. I tried using the contigs from an intermediate set of contigs produced with a single kmer but that clearly didn’t work with errors occurring due to missing dict keys for particular contig ids.
Any other suggestions you can make would be most welcome!
Many thanks
James
From: Vijini Mallawaarachchi ***@***.***>
Date: Sunday, 26 September 2021 at 00:51
To: Vini2/MetaCoAG ***@***.***>
Cc: James Abbott (Staff) ***@***.***>, Mention ***@***.***>
Subject: Re: [Vini2/MetaCoAG] keyError in get_cov_len_megahit (#3)
CAUTION: This email originated from outside the University of Dundee. Do not click links or open attachments unless you recognise the sender's email address and know the content is safe.
Hello @jamesabbott<https://github.com/jamesabbott>,
MetaCoAG currently supports assembly graphs in GFA1 format. I have also tested Heng Li’s fastg2gfa<https://github.com/lh3/gfa1/blob/master/misc/fastg2gfa.c> script and the assembly graph produced works fine with MetaCoAG.
Please let me know how your run goes with the new GFA assembly graph. Let me know if you come across further issues.
Thank you very much for using MetaCoAG and pointing out the issues!
Best regards,
Vijini
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIRXSPCHXPSJHJ74VHSF3DUDZN7VANCNFSM5ETMIFJA>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
The University of Dundee is a registered Scottish Charity, No: SC015096
|
Hi, I am having the same issue...
Here are the first lines of the input files (for contigs only headers are shown):
Any ideas? Thanks! PS: Actually, checking again the line references in the error message are different:
|
Hello @jamesabbott and @chassenr, I'm extremely sorry for getting back late to you. I have fixed the KeyError in the MEGAHIT version of MetaCoAG (Commit 4540c66a4b5af5108c9da1aca4bc42af56003a45). Please get the latest pull from the repo and have a try. Let me know how things go. Thank you very much for pointing out this error! |
Hello @jamesabbott, About your question on what kmer-size to use when creating the Hope this helps. Let me know if you come across any issues. I truly appreciate the input. Best regards, |
Hi Vijini,
Many thanks – I’ve updated my installation and have started rerunning the assembly. I’ll try a few different kmer sizes to see how the results compare.
Best Regards
James
From: Vijini Mallawaarachchi ***@***.***>
Date: Tuesday, 12 October 2021 at 05:57
To: Vini2/MetaCoAG ***@***.***>
Cc: James Abbott (Staff) ***@***.***>, Mention ***@***.***>
Subject: Re: [Vini2/MetaCoAG] keyError in get_cov_len_megahit (#3)
CAUTION: This email originated from outside the University of Dundee. Do not click links or open attachments unless you recognise the sender's email address and know the content is safe.
Hello @jamesabbott<https://github.com/jamesabbott>,
About your question on what kmer-size to use when creating the .fastg file from the megahit final_contigs.fa, I have seen that the connectivity information change when you change the kmer-size. It depends on how good your assembly is. k=141 can be a good choice for assemblies obtained from reads with read length of 150 - 300bp. Also k=77 can be a good choice for assemblies obtained from reads with read length of 100bp.
Hope this helps. Let me know if you come across any issues. I truly appreciate the input.
Best regards,
Vijini
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIRXSO2KVBPHJANWTRNJODUGO54PANCNFSM5ETMIFJA>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
The University of Dundee is a registered Scottish Charity, No: SC015096
|
Just to let you know that this seems to have solved the problem - the assembly has now continuned past the point where it previously failed. I did have a problem with hmmsearch, since the distributed version is not compatible with out RHEL7 based cluster due to a glibc incompatibility. I worked round this by installing hmmer via conda and updating |
Hello @jamesabbott, I have added fraggenescan and hmmer to the conda Thank you very much for your input. It has been very useful to improve MetaCoAG. |
Hi @Vini2 I am getting a similar error - however this is on using spades assembler; any suggestions would be appreciated.
I have checked my abundance files and appears to properly tab etc
|
Hi @AmaliT, Thanks for your interest in MetaCoAG. Can you please attach your coverage file to a comment on this issue? Thanks! |
please see attached |
Hi @AmaliT, Which version of MetaCoAG are you using? You can see the version using the command |
|
Can you please check if |
Hi @Vini2 This is what I get - could it be the _1 causing the issue on paths? I noticed all of them seem to have _*. I had re-create paths file as I didnt have them using the gfa file.
|
Hi @AmaliT, Looks like the format of the The
Can you please double-check the input files? The assembly graph file should begin with lines starting with Thanks! |
Hi @Vini2 Thanks for that. Its the paths file causing the issue - I was using I have managed to get pass this point with a set for which I had paths file from Spades (version 3.15.3). Thanks for the help with debugging :) Much appreciated |
Hi @AmaliT, No problem! Currently, I don't have any way to generate the paths file. I will see if I can come up with a script. |
Hi @Vini2, I am quite new to metacoag and have run it on other megahit/metaspades assemblies with no issues except for one megahit assembly. Here is my error message:
here is the start of my abundance file:
the start of my contigs file: `>k141_718556 flag=1 multi=3.0000 len=303
the start of my gfa file:
Would really appreaciate any help at all. |
I'm getting the following error with a megahit assembly:
Looking at the code, coverages is a dict however in my case it has no entry with a key of '0', hence it falls over. If my interpretation is correct, then if there was a key of '0' it would contain the coverage of contig 0. Changing feature_utils.py line 237 to:
allows it to proceed, although I'm not convinced the returned value is actually what was intended.
Does this sound like the correct approach?
The text was updated successfully, but these errors were encountered: