Skip to content

Nextflow workflow to run poppunk on a set of fasta files

License

Notifications You must be signed in to change notification settings

erinyoung/needles

Repository files navigation

erinyoung/needles

GitHub Actions CI Status GitHub Actions Linting StatusCite with Zenodo nf-test

Nextflow run with conda run with docker run with singularity Launch on Seqera Platform

Introduction

erinyoung/needles is a bioinformatics pipeline to download a corresponding pre-built poppunk database to use on input fasta files.

The steps are as follows:

  1. Download the poppunk database for taxid (https://www.bacpop.org/poppunk/)
  2. Assign fasta files to clusters
  3. Visualize (for microreact is the default)

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,fasta
sample1,sample1.fasta

Each row represents a fasta file.

The next step is to select a taxid. A list of poppunk databases are in json format at (assets/poppunkdb.json)(./assets/poppunk_db.json). The default taxid is 1314 for _Streptococcus pyogenes.

Current options for taxid:

  • "470" : "Acinetobacter baumannii"
  • "520" : "Bordetella pertussis"
  • "197" : "Campylobacter jejuni"
  • "5476" : "Candida albicans"
  • "1351" : "Enterococcus faecalis"
  • "1352" : "Enterococcus faecium"
  • "562" : "Escherichia coli"
  • "727" : "Haemophilus influenzae"
  • "210" : "Helicobacter pylori"
  • "197911" : "Influenza virus"
  • "573" : "Klebsiella pneumoniae"
  • "446" : "Legionella pneumophila"
  • "1639" : "Listeria monocytogenes"
  • "36809" : "Mycobacterium abscessus"
  • "1773" : "Mycobacterium tuberculosis"
  • "485" : "Neisseria gonorrhoeae"
  • "487_2" : "Neisseria meningitidis" from "https://doi.org/10.12688/wellcomeopenres.14826.1"
  • "487" : "Neisseria meningitidis" from "bacpop/PopPUNK#267"
  • "287" : "Pseudomonas aeruginosa"
  • "4932" : "Saccharomyces cerevisiae"
  • "590" : "Salmonella sp."
  • "1280" : "Staphylococcus aureus",
  • "40324" : "Stenotrophomonas maltophilia"
  • "1311" : "Streptococcus agalactiae"
  • "1334" : "Streptococcus dysgalactiae subspecies equisimilis"
  • "28037" : "Streptococcus mitis",
  • "1313" : "Streptococcus pneumoniae"
  • "1314" : "Streptococcus pyogenes"
  • "1307" : "Streptococcus suis"

Now, you can run the pipeline using:

nextflow run erinyoung/needles \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --taxid <TAXID> \
   --outdir <OUTDIR>

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Credits

erinyoung/needles was originally written by Erin Young, and is mostly a wrapper for using Poppunk.

Poppunk can be cited with the following paper:

Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW, Weiser JN, Corander J, Bentley SD, Croucher NJ. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Research 29:304-316 (2019). doi:10.1101/gr.241455.118

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

About

Nextflow workflow to run poppunk on a set of fasta files

Resources

License

Stars

Watchers

Forks

Packages

No packages published