Skip to content

Disambiguation algorithm for reads aligned to human and mouse genomes using Tophat or BWA mem

License

Notifications You must be signed in to change notification settings

bioinfomagician/disambiguate

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

disambiguate

============

Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from Tophat, Hisat2, STAR or BWA mem. Both a Python and C++ implementation are offered. The Python implementation has a dependency on the Pysam module. The C++ implementation depends on the availability of zlib and the Bamtools C++ API. For STAR alignments it is highly recommended to include the NM tag in the output when performing alignment (in fact this is a requirement for the C++ version).

Differences between the Python and C++ versions:

  1. The Python version can do natural name sorting of the reads (a necessary step) internally but for the C++ version the input BAM files must be natural name sorted (internal natural name sorting not supported).
  2. The flag -s (samplename prefix) must be provided as an input parameter to the C++ binary

For usage help, run disambiguate.py as-is.

To compile the C++ program, use the following syntax in the same folder where the code is:

c++ -I /path/to/bamtools_c_api/include/ -I./ -L /path/to/bamtools_c_api/lib/ -o disambiguate dismain.cpp -lz -lbamtools

A pre-compiled binary is also available in bioconda http://bioconda.github.io/recipes/ngs-disambiguate/README.html

DOI

Citing

Ahdesmäki MJ, Gray SR, Johnson JH and Lai Z. Disambiguate: An open-source application for disambiguating two species in next generation sequencing data from grafted samples. F1000Research 2016, 5:2741, DOI:10.12688/f1000research.10082.1

About

Disambiguation algorithm for reads aligned to human and mouse genomes using Tophat or BWA mem

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 91.0%
  • Python 9.0%