Skip to content

Emulating the Unique Peptide Finder

Pieter Verschaffelt edited this page Mar 7, 2024 · 1 revision

Until last 2023, Unipept provided a tool called the unique peptide finder at https://unipept.ugent.be. Because of outdated dependencies and the necessity to redo the whole project, we decided to deprecate this tool until further notice.

If, however, you would still like to find out the unique peptides for a specific taxon (or a list of proteins), you can emulate the behaviour with the Unipept CLI.

  1. First, retrieve a list of proteins for which you would like to check what the unique peptides are.

    To get the unique peptides for a taxon: Download all associated protein sequences through UniProtKB. You can use this endpoint to get a FASTA-file with all these proteins for a taxon with NCBI ID xxxx: https://rest.uniprot.org/uniprotkb/stream?format=fasta&query=%28%28taxonomy_id%3A<XXXX>%29%29 (Replace <XXXX> with the taxon ID that you're interested in).

  2. Then, perform an in-silico tryptic digest of these protein sequences using the Unipept CLI and the file you downloaded in the previous step: cat <FILENAME> | prot2pept > digested.txt. A list of all tryptic peptides that are present in the FASTA-file will be stored in digested.txt.

  3. Now, the procedure differs between unique peptides for a taxon and unique peptides for a list of proteins:

    Find unique peptides for taxon: Feed the file digested.txt that you created earlier to unipept pept2taxa and remove those peptides that are reported more than once (those are not unique, only those that are reported exactly once should be kept).

    Find unique peptides for protein list: Feed the file digested.txt that you created earlier to unipept pept2prot and remove those peptides that are reported more than once.

  4. The peptides that are left, should be the unique peptides for input you provided.