Skip to content

The project devoted to the estimation of bacteriophages in metagenomes.

Notifications You must be signed in to change notification settings

omashkova/bacteriophages

Repository files navigation

The project was aimed at pipeline development for estimation of alpha-diversity of bacteriophages in metagenomes via hidden Markov models of single virus genes. The application of metagenomic sequencing methods (including sequencing of virus metagenomes) also makes it possible to annotate the virus sequences, many of which could be considered among the so-called ‘dark microbial matter’. To evaluate the species diversity of bacteria in microbial communities one usually uses gene sequence of 16s rRNA. However, the estimation of diversity of viruses could present difficulties due to the lack of universal gene markers.

Files All_pVOG_capsids.hmm, All_terminase.hmm, All_portal_hmms.hmm contain hidden Markov models of capside, terminase and portal proteins extracted from the pVOGs database (http://dmk-brain.ecn.uiowa.edu/pVOGs/) which were used within the framework of this research. Protein prediction from metagenomic assemblies were carried out using Prodigal 2.6.3, then the retrieved proteins were compared with the above mentioned hidden Markov models via hmmsearch 3.3. Finally, the number of contigs with capsid, terminase and portal genes was calculated. As a diversity estimation of phages the mean of these three parameters is offered.

In Notebook you can find a detailed description of the analysis methods used.

File find_contigs.py contains a Python3 script calculating the number of contigs with capsid, terminase and portal genes and some other statistics.

As a case in point the number of contigs for each group of capsid, terminase and portal genes for virome of human intestine of different age groups of females was calculated. The results of per sample distribution of observed OTUs showed that the average number of contigs reported increases with age. Furthemore, the interpersonal variation of diversity among ‘younger’ samples was smaller compared with that of the ‘older’ ones, the same being true of per sample variation of contigs of different types. In 'report' folder you may find the per-sample distribution of the number of contigs with terminase, capsid and portal genes and boxplots of contig numbers for each gene type.

About

The project devoted to the estimation of bacteriophages in metagenomes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages