Disambiguate reads that were mapped to multiple references.
Install with the Conda or Mamba package manager after setting your Bioconda channels:
❯ conda install neodisambiguate
Alignment disambiguation is commonly performed on sequencing data from transduction, transfection, transgenic, or xenographic (including patient derived xenograft) experiments. This tool works by comparing various alignment metrics between a template that has been aligned to many different references in order to determine which reference is the most likely source. Disambiguation of aligned reads is made per-template and information across primary, secondary, and supplementary alignments is used as evidence.
All templates which are positively assigned to a single source reference are written to a reference-specific output BAM file. Any templates with ambiguous reference assignment are written to an ambiguous input-specific output BAM file.
Only BAMs produced from the Burrows-Wheeler Aligner (bwa) or STAR are currently supported. Input BAMs of arbitrary sort order are accepted, however, an internal sort to queryname will be performed unless the BAM is already in queryname sort order. All output BAM files will be written in the same sort order as the input BAM files. Although paired-end reads will give the most discriminatory power for disambiguation of short-read sequencing data, this tool accepts paired, single-end (fragment), and mixed pairing input data.
- Accepts SAM/BAM sources of any sort order
- Will disambiguate an arbitrary number of BAMs, all aligned to different references
- Writes the ambiguous alignments to a separate directory
- Extensible implementation which supports alternative disambiguation strategies
- Benchmarks show high accuracy: Click Here
❯ neodisambiguate -i infile1.bam infile2.bam -o out/disambiguated
To disambiguate templates for sample dna00001
that are aligned to human (A) and mouse (B):
❯ neodisambiguate -i dna00001.A.bam dna00001.B.bam -o out/dna00001 -n hg38 mm10
❯ tree out/
out/
├── ambiguous-alignments/
│ ├── dna00001.A.ambiguous.bai
│ ├── dna00001.A.ambiguous.bam
│ ├── dna00001.B.ambiguous.bai
│ └── dna00001.B.ambiguous.bam
├── dna00001.hg38.bai
├── dna00001.hg38.bam
├── dna00001.mm10.bai
└── dna00001.mm10.bam
Bootstrap compilation and build the executable with:
./mill neodisambiguate.executable
./bin/neodisambiguate --help
This project was inspired by AstraZeneca's disambiguate
: