Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
Torsten Seemann committed Mar 1, 2015
2 parents 129061e + 24ca67f commit 817140b
Showing 1 changed file with 36 additions and 2 deletions.
38 changes: 36 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
#Snippy
Rapid haploid variant calling by Torsten Seemann
Rapid haploid variant calling and core SNP phylogeny

##Author
Torsten Seemann (@torstenseemann)

##Synopsis
Snippy finds SNPs between a haploid reference genome and your NGS sequence reads. It will find both substitutions (snps) and insertions/deletions (indels). It will use as many CPUs as you can give it on a single computer (tested to 64 cores). It is designed with speed in mind, and produces a consistent set of output files in a single folder.
It can then take a set of Snippy results using the same reference and generate a core SNP alignment and tree.

##Quick Start
```
Expand All @@ -24,7 +28,15 @@ chr 45722 ins ATT ATTT ATTT:43 ATT:1 CDS - E
chr 100541 del CAAA CAA CAA:38 CAAA:1 CDS + ECO_0179 hypothetical protein
plas 619 complex GATC AATA GATC:28 AATA:0
plas 3221 mnp GA CT CT:39 CT:0 CDS + ECO_p012 rep hypothetical protein
% snippy-core --prefix core mysnps1 mysnps2 mysnps3 mysnps4
Loaded 4 SNP tables.
Found 2814 core SNPs from 96615 SNPs.
% ls core.*
core.aln core.tab core.tree core.tree.eps core.tree.svg
```
#Calling SNPs

##Input Requirements
* a reference genome in FASTA or GENBANK format (can be in multiple contigs)
Expand Down Expand Up @@ -89,19 +101,41 @@ The variant calling is done by [Freebayes](https://github.com/ekg/freebayes). Ho

By default Snippy uses ```--mincov 10 --minfrac 0.9``` which is reasonable for most cases, but for very high coverage data you may get mixed populations such as (REF:310 ALT:28). Snippy may use a more statistical approach in future versions like [Nesoni](https://github.com/Victorian-Bioinformatics-Consortium/nesoni) does.

#Core SNP phylogeny

If you call SNPs for multiple isolates from the same reference, you can produce an alignment of "core SNPs" which can be used to build a high-resolution phylogeny (ignoring possible recombination). A "core site" is a genomic position that is present in _all_ the samples. A core site can have the same nucleotide in every sample ("monomorphic") or some samples can be different ("polymorphic" or "variant"). If we ignore the complications of "ins", "del" and "complex" variant types, and just use "snp" and "mnp" sites variant sites, these are the "core SNP genome".

##Input Requirements
* a set of Snippy folders which used the same ``--ref`` sequence.

##Output Files

Extension | Description
----------|--------------
.aln | A core SNP alignment in the ```--aformat``` format (default FASTA)
.tab | Tab-separated columnar list of core SNP sites with alleles and annotations
.tree | A phylogenetic tree in the ```--tformat``` format (default NEWICK)
.tree.eps | An EPS image of the .tree file
.tree.svg | An SVG image of the .tree file

#Information

##Etymology
The name Snippy is a combination of [SNP](http://en.wikipedia.org/wiki/Single-nucleotide_polymorphism) (pronounced "snip") , [snappy](http://www.thefreedictionary.com/snappy) (meaning "quick") and [Skippy the Bush Kangaroo](http://en.wikipedia.org/wiki/Skippy_the_Bush_Kangaroo) (to represent its Australian origin)

##License
Snippy is free software, released under the GPL (version 3).

##Issues
Please submit suggestions and bug reports here: https://github.com/Victorian-Bioinformatics-Consortium/snippy/issues

##Requirements
* Perl >= 5.6
* BioPerl >= 1.6
* bwa mem >= 0.7.12
* samtools >= 1.1
* freebayes >= 0.9.20
* GNU parallel > 2013xxxx
* freebayes >= 0.9.20
* freebayes sripts (freebayes-parallel, fasta_generate_regions.py)
* vcflib (vcffilter, vcfstreamsort, vcfuniq, vcffirstheader)
* vcftools (vcf-consensus)
Expand Down

0 comments on commit 817140b

Please sign in to comment.