Bacterial WGS training : Exercise 6

Title	Chromosome, plasmid, resistance and virulence annotation
Training dataset:
Questions:	How many genes there are in my sample? Are there virulence and/or antibiotic resistance genes? Where are the genes located? Which plasmids are present in the sample? How do I visualize the results?
Objectives:	Annotate virulence and ABR genes Determine gene variants Determine plasmidome Locate annotated genes Results interpretation
Time estimation:	1 h
Key points:	Comparing annotation using mapping vs assembly Plasmid, virulence and resistance determination

Bacterial WGS training : Exercise 6

Introduction

In this exercise we are going to determine the genomic content of a multidrug-resistant (MDR) K. neumoniae isolate. First we will usse srst2 to asses the resistome and later, we will use plasmidID to infer biological and positional information to sequences and see where the genes, detected with mapping strategy, are located.

Training dataset description

The sample we are going to analyse is an in silico dataset obtained with wgsim using a sample of Klebsiella pneumoniae subsp. pneumoniae HS11286 available at ncbi.

Exercise

Mapping based annotation

To execute srst2, which maps the reads against a antibiotic resistance genes database (ARGannot), lets execute this command:

cd
cd wgs/bacterial_wgs_training_dataset/ANALYSIS
nextflow run ../../bacterial_wgs_training/main.nf \
--reads '../REFERENCES/plasmidid_test/KPN*_R{1,2}.fastq.gz' \
--fasta ../REFERENCES/listeria_NC_021827.1_NoPhagues.fna \
-profile conda \
--gtf ../REFERENCES/listeria_NC_021827.1_NoPhagues.gff \
--srst2_resistance ../REFERENCES/ARGannot.r1.fasta \
--srst2_virulence ../REFERENCES/EcOH.fasta \
--step mapAnnotation \
--outdir 07-mapAnnotation \
-resume

Results should look like that

Sample	DB	gene	allele	coverage	depth	diffs	uncertainty	divergence	length	maxMAF	clusterid	seqid	annotation
KPN_TEST_R	ARGannot.r1	RmtB_AGly	RmtB_1580	100.0	12.09	1snp		0.132	756	0.125	309	1580	no;no;RmtB;AGly;AB263754;2843-3598;756
KPN_TEST_R	ARGannot.r1	TEM-1D_Bla	TEM-117_968	100.0	33.386	2snp		0.262	764	0.382	205	968	no;no;TEM-117;Bla;AY130282;1-764;764
KPN_TEST_R	ARGannot.r1	KPC-1_Bla	KPC-14_809	100.0	5.412	1indel		0.0	876	0.333	184	809	no;no;KPC-14;Bla;JX524191;396-1271;876
KPN_TEST_R	ARGannot.r1	AmpH_Bla	AmpH_634	100.0	11.373	14snp		1.206	1161	0.143	86	634	no;no;AmpH;Bla;CP003785;4208384-4209544;1161
KPN_TEST_R	ARGannot.r1	CTX-M-9_Bla	CTX-M-14_102	100.0	26.676	1snp		0.114	876	0.412	190	102	no;yes;CTX-M-14;Bla;AF252622;1741-2616;876
KPN_TEST_R	ARGannot.r1	StrA_AGly	StrA_1501	100.0	12.502	2snp		0.249	804	0.167	263	1501	no;no;StrA;AGly;AJ627643;3725-4528;804
KPN_TEST_R	ARGannot.r1	StrB_AGly	StrB_1614	100.0	9.545	1snp		0.119	837	0.167	227	1614	no;no;StrB;AGly;KR091911;169145-169981;837
KPN_TEST_R	ARGannot.r1	AadA_AGly	AadA2_1605	100.0	9.306	2snp		0.256	780	0.167	229	1605	yes;no;AadA2;AGly;X68227;166-945;780
KPN_TEST_R	ARGannot.r1	SHV-OKP-LEN_Bla	SHV-11_1287	100.0	9.401			0.0	861	0.143	164	1287	yes;no;SHV-11;Bla;HM751098;1-861;861
KPN_TEST_R	ARGannot.r1	TetRG_Tet	TetRG_605	96.209	6.48	10snp24holes	edge0.0	1.642	633	0.5	373	605	no;no;TetRG;Tet;S52438;113-745;633
KPN_TEST_R	ARGannot.r1	DfrA_Tmt	DfrA12_1089	99.799	8.389	1indel		0.0	498	0.143	418	1089	yes;no;DfrA12;Tmt;Z21672;310-807;498
KPN_TEST_R	ARGannot.r1	TetG_Tet	TetG_632	100.0	9.963			0.0	1176	0.25	80	632	no;no;TetG;Tet;NC_010410;3672607-3671432;1176
KPN_TEST_R	ARGannot.r1	SulII_Sul	SulII_1219	100.0	11.094	1snp		0.123	816	0.2	256	1219	no;no;SulII;Sul;KR091911;167466-168281;816

This table is a full report of all the ARG found with all mapping stats.

Assembly based annotation

Now, using the contigs assembled using those same reads, we can determine the exact location of those ARG. ARG can be located on the chromosome but motly on plasmids. In that case, we are going to focus on plasmid derived ARG using the annotation feature of plasmidID. To run the analysis lets use this command:

cd
cd wgs/bacterial_wgs_training_dataset/REFERENCES/
cp -r /mnt/ngs_course_shared/bacterial_wgs_training_dataset/REFERENCES/plasmidid_test .

Now we can run the nextflow

cd ../ANALYSIS/
nextflow run ../../bacterial_wgs_training/main.nf \
--reads '../REFERENCES/plasmidid_test/KPN*_R{1,2}.fastq.gz' \
--fasta ../REFERENCES/listeria_NC_021827.1_NoPhagues.fna \
-profile conda \
--gtf ../REFERENCES/listeria_NC_021827.1_NoPhagues.gff \
--plasmidid_database ../REFERENCES/plasmidid_test/plasmids_TEST_database.fasta \
--plasmidid_config ../REFERENCES/plasmidid_test/plasmidid_config.txt \
--step plasmidID \
--outdir 08-plasmidID \
-resume

Results should look like these

NC_016838.1	NC_016839.1	NC_016840.1

NC_016841.1	NC_016846.1	NC_016847.1

Those are the 6 plasmids that this isolate had, have a look at those pictures and find out if the genes are the same allele.

Are all the genes located with srst2 bound to plasmids?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

06_annotation.md

06_annotation.md

Bacterial WGS training : Exercise 6

Introduction

Training dataset description

Exercise

Mapping based annotation

Results should look like that

Assembly based annotation

Results should look like these

Files

06_annotation.md

Latest commit

History

06_annotation.md

File metadata and controls

Bacterial WGS training : Exercise 6

Introduction

Training dataset description

Exercise

Mapping based annotation

Results should look like that

Assembly based annotation

Results should look like these