Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update paper.md #47

Open
wants to merge 1 commit into
base: paper
Choose a base branch
from
Open

Update paper.md #47

wants to merge 1 commit into from

Conversation

quantumjot
Copy link
Collaborator

Small edits:

  • Fixed some of the formatting
  • Changed some wording

Comments:

  • What is a signal peptide and why do we want to cleave it? Is this a common operation?
  • Is there a way to specify just a biological assembly, or is the entire file downloaded?
  • I'm not sure about the references used in the visual proteomics sentence - in particular the particle pickers usually don't require a structure, rather the raw image data.
  • The number of use cases is probably larger than stated - what about applications that involve large scale analysis of protein structure, like many problems in bioinformatics?
  • It would be worth mentioning that profet is already a dependency of parakeet

Small changes
@@ -38,7 +38,7 @@ bibliography: paper.bib

# Summary

The *profet* (**pro**tein structure **fet**cher) python library provides a simple and streamlined interface that makes it easy to download protein structures from various online databases. Since its founding in 1971, over 1 million experimentally determined macromolecular structures have been deposited and made freely available to all in the Protein Data Bank (PDB) archive [@pdb]. The availability of this wealth of experimental data has been pivotal in the development of new software in the field. Recently, the AlphaFold2 [@alphafold] team released over 200 million predicted macromolecular structures on their online platform. Being able to easily access these incredible open repositories of experimental and simulated data is crucial for accelerating scientific software development in structural biology. However, in practice, doing this can be cumbersome, as each database has their own manual download system, or individual python package.
The *profet* (**pro**tein structure **fet**cher) python library provides a simple and streamlined interface that makes it easy to download protein structures from various online databases. Since its founding in 1971, over 1 million experimentally determined macromolecular structures have been deposited and made freely available to all in the Protein Data Bank (PDB) archive [@pdb]. The availability of this wealth of experimental data has been pivotal in the development of new software in the field. Recently, the AlphaFold2 [@alphafold] team released over 200 million predicted macromolecular structures on their online platform. Being able to easily access these large open repositories of experimental and simulated data is crucial for accelerating scientific software development in structural biology. However, in practice, doing this can be cumbersome, and we lack a unified framework to download structural data in a format compatible with modern machine learning problems.

With *profet*, users can conveniently download individual structures directly using python by simply specifying their Uniprot ID [@uniprot]. Users can specify which database they would like to use by default and if the structure is available from that source it will be downloaded. If the structure is not available from that source, *profet* will seek to download it from an alternative database. When a structure file is downloaded, it is cached to a local directory; if the same structure is requested again, either during the same session or a later session, then the cached structure file will be used to avoid having to download the file multiple times. Typical applications that require the ability to download many structures on demand are protein matching algorithms for visual proteomics, such as [@cryolo] [@affinity], large scale models in molecular dynamics simulations [@mcguffee] [@bigsim], and electron microscopy simulations [@parakeet]. As well as having a simple python API, profet also provides a simple command line interface, enabling the user to utilise profet either as part of a script or as a standalone program.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typical applications that require the ability to download many structures on demand are protein matching algorithms for visual proteomics, such as [@cryolo] [@affinity], large scale models in molecular dynamics simulations [@mcguffee] [@bigsim], and electron microscopy simulations [@Parakeet].

I think this sentence could be split up and expanded upon. Also, please see my other comment about the particle pickers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant