Skip to content

A tool for the identification of coincident (associating and dissociating) genes in pangenomes.

License

Notifications You must be signed in to change notification settings

RJBeng/coinfinder

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README

coinfinder-logo

Coinfinder

A tool for the identification of coincident (associating and dissociating) genes in pangenomes.

Written in collaboration with Martin Rusilowicz.

What is it?

Coinfinder is an algorithm and software tool that detects genes which associate and dissociate with other genes more often than expected by chance in pangenomes. Coinfinder is written primarily in C++ and is a command line tool which generates text, gexf, and pdf outputs for the user.

Coinfinder uses a Bonferroni-corrected Binomial exact test statistic of the expected and observed rates of gene-gene association to evaluate whether a given gene pair is coincident.

When and why should I use it?

Coinfinder is designed to take as input a dataset of pangenomes and their genes. Ideally, genes will clustered into homologous gene clusters using a pangenomic tool such as Roary, PIRATE, or Pandora. Coinfinder should be used to identify coincident gene sets within a given pangenomic dataset. Coinfinder was written to identify coincident genes among strains of prokaryote species (i.e. a species pangenome) but can be extended to other pangenomic datasets.

Dependencies:

Quick installation instructions:

cmake -DCMAKE_BUILD_TYPE=Release .
cmake --build .
./coinfinder

Usage:

coinfinder -i <gene information> [-I] -p <phylogeny> -o <output prefix> [--associate|--dissociate]

Coinfinder requires gene information and a phylogeny as input. The gene information can be provided in one of two formats: (a) as the gene_presence_absence.csv output from Roary; (b) as a tab-delimited list of genes present in each strain. An example of a tab-delimited list of genes:

gene_1	genome_1
gene_1	genome_2
gene_1	genome_3
gene_2	genome_2
gene_2	genome_3
gene_3	genome_1
gene_3	genome_2

The phylogeny should be Newick-formatted with no zero-length branches. We suggest that this phylogeny be constructed using the core gene information (for example, as suggested in the Roary pipeline https://sanger-pathogens.github.io/Roary/).

Lastly, the user must decide between running Coinfinder to find associations (gene pairs present together) or dissociations (gene pairs which are present apart, or avoid each other).

Example output:

example-output

An example association network in which each gene (node) is connected to another gene with a line (edge) iff they statistically co-occur with each other. Nodes are weighted by lineage-independence in the phylogeny (i.e. the larger the node, the more phylogenetically independent the gene). Nodes are coloured by connected component, or the set of genes with associative relationships with each other. This data can also be shown as a presence/absence heatmap in relation to the phylogeny (note: this heatmap is a subset of all results; in particular, the large wine coloured gene set has been removed for ease of visibility).

Example usage:

Coming soon...

What if I find a bug or have an issue running coinfinder?

If you run into any issues with coinfinder, we want to hear about it! Please don't be shy, and log an Issue including as much of the following as possible:

  • The exact command that you used to call coinfinder (helps us identify where in the code the bug might be).
  • A reproducible example of the issue with a small dataset that you can share (helps us identify whether the issue is specific to a particular computer, operating system, and/or dataset).

About

A tool for the identification of coincident (associating and dissociating) genes in pangenomes.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 77.3%
  • R 16.3%
  • Python 4.0%
  • CMake 1.9%
  • Other 0.5%