Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phage genome annotations for GenBank submission #76

Open
vdruelle opened this issue Oct 31, 2024 · 1 comment
Open

Phage genome annotations for GenBank submission #76

vdruelle opened this issue Oct 31, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@vdruelle
Copy link

Hi @gbouras13,

Thanks for creating Phold, it's a great tool !
I'm writing this to suggest some changes in the output of Phold (or Pharokka ?) to simplify the process of uploading annotated phage genomes to Genbank.

To give you a bit of perspective, we isolated a new phage that we characterised and want to upload on GenBank. We illumina sequenced it, assembled it with Unicycler, annotated and rotated it with Pharokka and then annotated with Phold. We then used the .gbk file generated by Phold with the tool https://chlorobox.mpimp-golm.mpg.de/GenBank2Sequin.html to generate the .fasta and .tbl file necessary for the submission via BankIt.

Unfortunately some of the annotations generated were not in a format that was accepted by BankIt. The main issues were that:

  • the \tRNA field of the tRNA features was not recognised. Replacing \tRNA with \product made it work. It seems \product is the field used in other published phage genomes I checked.
  • the \anticodon field of the tRNA features is in incorrect format. Right now it looks like /anticodon=TAA while it's supposed to look something like /anticodon=(pos:678..680,aa:Leu,seq:taa). For now I made it work by just removing this \anticodon field.

My suggestion would be to fix the formatting of the tRNA features so that it's accepted for GenBank submission without the need for manual correction of these features. I assume this would simplify the usage of Phold to annotate phages for GenBank submissions. Additionally one could add the feature .tbl file needed for submission as an output of Phold to avoid relying on a tool like https://chlorobox.mpimp-golm.mpg.de/GenBank2Sequin.html to create the feature .tbl from the .gbk file.

I tried looking at how Pharokka does that to propose a pull request with the changes needed, but I know too little about this tool to figure out how to do that...

Let me know in case you need more details and thank you again for your work,
Valentin


  • phold version: 0.2.0
  • pharokka version: 1.7.3
  • Python version: 3.11.9
  • Operating System: Ubuntu 20.04 LTS

Problematic .gbk feature:
tRNA 140628..140713
/ID="DSQLZVHB_tRNA_0001"
/transl_table=11
/trna="tRNA-Leu(TAA)"
/isotype="Leu"
/anticodon=TAA
/locus_tag="DSQLZVHB_tRNA_0001"
/source="tRNAscan-SE_2.0.12"

Corrected to be accepted by GenBank:
tRNA 140628..140713
/ID="DSQLZVHB_tRNA_0001"
/transl_table=11
/product="tRNA-Leu(TAA)"
/isotype="Leu"
/locus_tag="DSQLZVHB_tRNA_0001"
/source="tRNAscan-SE_2.0.12"

@gbouras13
Copy link
Owner

Hi @vdruelle ,

Thank you for this issue - I will work on this now. I think the issue with pharokka is in the library I use to convert gff to gbk.

A refactor of pharokka is needed to replace that dependency, but I do not have time for it right now (or in the foreseeable future). So therefore, I think the best thing to do would be to fix the Phold GBK format to comply with the submission.

George

@gbouras13 gbouras13 added the bug Something isn't working label Jan 9, 2025
gbouras13 added a commit that referenced this issue Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants