Proteomics

Jump to bottom

Alex V. Kotlar edited this page Sep 5, 2023 · 16 revisions

Protein Abundance Search Design

User has combined protein abundance - like file (combined_protein.tsv, https://fragpipe.nesvilab.org/docs/tutorial_fragpipe_outputs.html#combined_proteintsv)

PK: This may not be the exact output, Leo, who made FragPipe, sent us a related, but different version of this output
CT: This is the format we should expect (except we will request the row names are gene not gene|uniprot_id):

We create a Python cli tool/api library function, which submits a batch request to filter the protein abundance file using a Bystro annotation.

This requires the user to first authenticate using our existing Python API for authentication
The batch submission requires:
- user id
- path to protein abundance file or multipart file upload stream of the protein abundance file if local (in v1 ok to support 1 only)
- the job id
- basename for the output
- the query string query

The job is processed through our API server, which results in a beanstalkd submission to our cluster, where the filtering of the protein abundance file happens, resulting in a new protein abundance file written to disk, and a path for that file returned to the user.

We re-use the code (duplication is OK for v1) from bystro/search/save/handler.py for handling the scrolling through annotation index
We pull down the annotation index as in bystro/search/save/handler.py, and add the protein abundance values
We persist these values on EFS, return a path to the results

We create a Python CLI/API tool/library function to allow the user to pull down the filtered protein abundance results using the user id and the job basename.

Clone this wiki locally