Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF included repo doesn't include ORF1ab frameshift #110

Open
tnguyensanger opened this issue Jun 28, 2021 · 0 comments
Open

GFF included repo doesn't include ORF1ab frameshift #110

tnguyensanger opened this issue Jun 28, 2021 · 0 comments

Comments

@tnguyensanger
Copy link

Is the GFF included in the repo intended for production use or is it only for unit testing?

https://github.com/connor-lab/ncov2019-artic-nf/blob/9ac3119a875d75c49de65848a3587e6fcec22d1c/typing/MN908947.3.gff

The GFF included in the repo seems to use coordinates for the ORF1ab gene that do not take into account 1bp frameshift due to ribosomal slippage:

MN908947.3	ensembl	gene	266	13483	.	+	.	ID=gene:ENSSASG00005000003;Name=ORF1ab;biotype=protein_coding;description=ORF1a polyprotein%3BORF1ab polyprotein [Source:NCBI gene (formerly Entrezgene)%3BAcc:43740578];gene_id=ENSSASG00005000003;logic_name=ensembl_covid;version=1
MN908947.3	ensembl	mRNA	266	13483	.	+	.	ID=transcript:ENSSAST00005000003;Parent=gene:ENSSASG00005000003;Name=ORF1a;biotype=protein_coding;transcript_id=ENSSAST00005000003;version=1
MN908947.3	ensembl	exon	266	13483	.	+	.	Parent=transcript:ENSSAST00005000003;Name=ENSSASE00005000003;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=ENSSASE00005000003;rank=1;version=1
MN908947.3	ensembl	CDS	266	13483	.	+	0	ID=CDS:ENSSASP00005000003;Parent=transcript:ENSSAST00005000003;protein_id=ENSSASP00005000003
####
MN908947.3	ensembl	gene	266	21555	.	+	.	ID=gene:ENSSASG00005000002;Name=ORF1ab;biotype=protein_coding;description=ORF1a polyprotein%3BORF1ab polyprotein [Source:NCBI gene (formerly Entrezgene)%3BAcc:43740578];gene_id=ENSSASG00005000002;logic_name=ensembl_covid;version=1
MN908947.3	ensembl	mRNA	266	21555	.	+	.	ID=transcript:ENSSAST00005000002;Parent=gene:ENSSASG00005000002;Name=ORF1ab;biotype=protein_coding;transcript_id=ENSSAST00005000002;version=1
MN908947.3	ensembl	exon	266	21555	.	+	.	Parent=transcript:ENSSAST00005000002;Name=ENSSASE00005000002;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=ENSSASE00005000002;rank=1;version=1
MN908947.3	ensembl	CDS	266	21555	.	+	0	ID=CDS:ENSSASP00005000002;Parent=transcript:ENSSAST00005000002;protein_id=ENSSASP00005000002

This frameshift is seen in the latest NCBI GFF: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/858/895/GCF_009858895.2_ASM985889v3/GCF_009858895.2_ASM985889v3_genomic.gff.gz

NC_045512.2     RefSeq  gene    266     21555   .       +       .       ID=gene-GU280_gp01;Dbxref=GeneID:43740578;Name=ORF1ab;gbkey=Gene;gene=ORF1ab;gene_biotype=protein_coding;locus_tag=GU280_gp01
NC_045512.2     RefSeq  CDS     266     13468   .       +       0       ID=cds-YP_009724389.1;Parent=gene-GU280_gp01;Dbxref=Genbank:YP_009724389.1,GeneID:43740578;Name=YP_009724389.1;Note=pp1ab%3B translated by -1 ribosomal frameshift;exception=ribosomal slippage;gbkey=CDS;gene=ORF1ab;locus_tag=GU280_gp01;product=ORF1ab polyprotein;protein_id=YP_009724389.1
NC_045512.2     RefSeq  CDS     13468   21555   .       +       0       ID=cds-YP_009724389.1;Parent=gene-GU280_gp01;Dbxref=Genbank:YP_009724389.1,GeneID:43740578;Name=YP_009724389.1;Note=pp1ab%3B translated by -1 ribosomal frameshift;exception=ribosomal slippage;gbkey=CDS;gene=ORF1ab;locus_tag=GU280_gp01;product=ORF1ab polyprotein;protein_id=YP_009724389.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant