Skip to content

Latest commit

 

History

History
23 lines (14 loc) · 790 Bytes

README.md

File metadata and controls

23 lines (14 loc) · 790 Bytes

Find-pBGCs

A pipeline of scripts to extract and analyze phage encoded Biosynthetic Gene Clusters (pBGCs)

Now published in Current Biology: https://www.cell.com/current-biology/fulltext/S0960-9822(21)00744-2

Depends on

ncbi-genome-download: https://github.com/kblin/ncbi-genome-download
ProphET: https://github.com/jaumlrc/ProphET
AntiSMASH: https://github.com/antismash/antismash
genbank_to_fasta.py: https://github.com/Coaxecva/GenBank-to-FASTA
bioawk
blast

Note that this takes very long to run, 2 days on a 64 core machine with 512gb RAM and results in ~500gb of disc use. Likely weeks, if at all, on a desktop.

WORKFLOW

ncbi-genome-download --parallel 64 --format fasta,gff --assembly-level complete bacteria

Find-pBGCs.sh refseq/bacteria/