Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cg/wgMLST query pipeline #2

Open
apetkau opened this issue Aug 19, 2023 · 1 comment
Open

Add cg/wgMLST query pipeline #2

apetkau opened this issue Aug 19, 2023 · 1 comment
Labels
pipeline An issue describing a pipeline

Comments

@apetkau
Copy link
Member

apetkau commented Aug 19, 2023

1. Purpose

The purpose of this pipeline is to query for genomes within a certain threshold of a collection of genomes.

2. Input

2.1. Query profiles

The input will consist of cg/wgMLST profiles for queries and a reference selection/scope of this query. This will be passed via the --input parameter and will look like the following:

querysheet.csv:

identifier query_allele_profiles reference_allele_profiles
query1 /path/to/query_allele_profiles /path/to/reference_allele_profiles
query2 /path/to/query_allele_profiles /path/to/reference_allele_profiles

2.1.1. Allele profiles (CSV)

The following example format will be used for the allele profiles for the CSV format (both uncompressed and gzipped files will be supported).

id loci1 loci2 ... lociN
SampleA be76 af5d ce78 d877a
ID10 af5d be76 ? d877a

3. Steps

3.1. Perform query

For each listed query, this will search for genomes within a particular threshold. This will use https://github.com/phac-nml/profile_dists.

4. Output

The following output will be provided. This will be communicated with an output.json file with the following larger structure:

{
    "files": { ... },
    "sample_metadata": { ... }
    "execution_metadata": { ... },
}
@apetkau apetkau added the pipeline An issue describing a pipeline label Aug 19, 2023
@apetkau
Copy link
Member Author

apetkau commented Aug 20, 2023

Example implementation is at https://github.com/apetkau/nf-core-queryprofiles

This can be run directly from GitHub if you have Nextflow and Docker installed by:

nextflow run https://github.com/apetkau/nf-core-queryprofiles -profile docker,test -r dev --outdir results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pipeline An issue describing a pipeline
Projects
None yet
Development

No branches or pull requests

1 participant