PaCPaC

PaCPaC - Paratope and Clonotype Probing and Clustering

🛠️ Installation and usage examples (Docker or Conda)

🐳 Docker

Installation with Docker

You must have Docker & Docker Compose installed.

git clone https://github.com/aretasg/pacpac.git
cd pacpac

💻 Example usage with Docker

Move csv_dataset to the /data folder

docker-compose run pacpac cluster <csv_dataset> <vh_amino_acid_sequence_column_name>
docker-compose run pacpac probe <probe_vh_amino_acid_sequence> <csv_dataset> <vh_amino_acid_sequence_column_name>

Check /data folder for output

🐍 Conda

Installation with Conda

Install conda first

git clone https://github.com/aretasg/pacpac.git
cd pacpac
conda env create -f environment.yml
conda activate pacpac
pip install .

📜 Example usage within Python

import pandas as pd
from pacpac import pacpac

df = pd.read_csv(<my_data_set.csv>)

df = pacpac.cluster(df, <vh_amino_acid_sequence_column_name>)
df = pacpac.probe(<probe_vh_amino_acid_sequence>, df, <vh_amino_acid_sequence_column_name>)

or alternatively cluster and/or probe using both, VH and VL, sequences

df = pacpac.cluster(df, <vh_amino_acid_sequence_column_name>, <vl_amino_acid_sequence_column_name>)
df = pacpac.probe(
  <probe_vh_amino_acid_sequence>,
  df,
  <vh_amino_acid_sequence_column_name>,
  <vl_amino_acid_sequence_column_name>,
  <probe_vl_amino_acid_sequence>
)

💻 Example usage in CLI

pacpac cluster <path_to_csv_dataset> <vh_amino_acid_sequence_column_name>
pacpac probe <probe_vh_amino_acid_sequence> <path_to_csv_dataset> <vh_amino_acid_sequence_column_name>

❓ Probing and clustering arguments

within Python

help(pacpac.cluster)
help(pacpac.probe)

in CLI

pacpac cluster --help
pacpac probe --help

💎 Features

Sequence annotations operations by ANARCI (Dunbar and Deane, 2015).
Deep learning model Parapred for paratope predictions (Liberis et al., 2018).
Clusters using greedy clustering approach.
Determinism is achieved by sorting the input data set by CDR lengths and paratope length for clonotype and paratope clustering, respectively, and amino acid sequence in a descending order.
Each cluster has a representitive sequence as indicated by a keyword seed.
Clonotyping is done on the amino acid sequence level. Any silent mutations on nucleotide sequence level due to SHM are not taken into an account.
Paratope probing and clustering provides several clustering options.

Probing & Clustering options

If structural_equivalence is set to False matches paratopes of equal CDR lengths only and assumes that CDRs of the same length always have deletions at the same position (Richardson et al., 2021). Useful in fast detection of similar paratopes.
When set to True (default) structurally equivalence as assigned by the numbering scheme is used (i.e. numbering residue positions are used for residue matching to allow for a comparison at structuraly equivalent positions) and assumes that CDRs of different lengths can have similar paratopes. Useful in detection of similar binding modes.
Sequence residues can be tokenized (tokenize=True) based on residue type groupings as described by Wong et al., 2021.

🏁 Benchmarks with 10K VH sequences with 4 conventional CPU cores

Task	Time (s)	Notes
Annotations using ANARCI	378	parallel execution
Paratope prediction using Parapred	207	batch execution without CPU/GPU speed up for TensorFlow
Clonotype clustering	13	on amino acid level
Paratope clustering	13	`structural_equivalence=False`
Paratope clustering	130	`structural_equivalence=True`
Probing	<0.1	clonotype & paratope

ANARCI and Parapred can be speed up with more cores and/or CPU/GPU speed up instructions for Tensorflow.

✏️ Authors

Written by Aretas Gaspariunas. Have a question? You can always ask and I can always ignore.

References

Dunbar and Deane, 2015
Liberis et al., 2018
Richardson et al., 2021
Wong et al., 2021

🍎 Citing

If you found PaCPaC useful for your work please acknowledge it by citing this repository.

@software{aretas_gaspariunas_2021_4470165,
  author       = {Aretas Gaspariunas},
  title        = {{aretasg/pacpac: PaCPaC - Python package to probe and cluster antibody VH sequence paratopes and clonotypes}},
  month        = jan,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {v0.1},
  doi          = {10.5281/zenodo.4470165},
  url          = {https://doi.org/10.5281/zenodo.4470165}
}

License

BSD license.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
data		data
pacpac		pacpac
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaCPaC

🛠️ Installation and usage examples (Docker or Conda)

🐳 Docker

Installation with Docker

💻 Example usage with Docker

🐍 Conda

Installation with Conda

📜 Example usage within Python

💻 Example usage in CLI

❓ Probing and clustering arguments

within Python

in CLI

💎 Features

Probing & Clustering options

🏁 Benchmarks with 10K VH sequences with 4 conventional CPU cores

✏️ Authors

References

🍎 Citing

License

About

Releases 8

Packages

Contributors 2

Languages

License

aretasg/pacpac

Folders and files

Latest commit

History

Repository files navigation

PaCPaC

🛠️ Installation and usage examples (Docker or Conda)

🐳 Docker

Installation with Docker

💻 Example usage with Docker

🐍 Conda

Installation with Conda

📜 Example usage within Python

💻 Example usage in CLI

❓ Probing and clustering arguments

within Python

in CLI

💎 Features

Probing & Clustering options

🏁 Benchmarks with 10K VH sequences with 4 conventional CPU cores

✏️ Authors

References

🍎 Citing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 2

Languages

Packages