-
Notifications
You must be signed in to change notification settings - Fork 2
Intermediary files
This page lists all intermediary files that are generated by the Unipept database construction script. Most of these files are compressed TSV-files that represent a specific step in the construction of peptide databases.
Every line in this file corresponds to a mapping between a peptide and a JSON-object containing all functional annotations associated with this peptide (assuming that the amino acids I and L are equal).
1 {"num":{"all":17,"EC":17,"GO":17,"IPR":17},"data":{"GO:0004477":14,"GO:0004488":14,"GO:0000105":14,"GO:0009086":14,"GO:0006164":14,"GO:0035999":14,"EC:1.5.1.5":14,"EC:3.5.4.9":14,"IPR:IPR046346":14,"IPR:IPR036291":14,"IPR:IPR000672":14,"IPR:IPR020630":14,"IPR:IPR020867":14,"IPR:IPR020631":14,"GO:0033201":1,"GO:0004373":1,"GO:0009011":1,"GO:0005978":1,"EC:2.4.1.21":1,"IPR:IPR001296":1,"IPR:IPR011835":1,"IPR:IPR013534":1,"GO:0044219":1,"GO:0039617":1,"GO:0003677":2,"GO:0005525":1,"GO:0003723":1,"GO:0005198":1,"GO:0046740":1,"EC:":2,"IPR:IPR003181":1,"IPR:IPR003182":1,"IPR:IPR029053":1,"GO:0000428":1,"GO:0001216":1,"GO:0016779":1,"GO:0016987":1,"GO:0006352":1,"GO:0009399":1,"IPR:IPR000394":1,"IPR:IPR007046":1,"IPR:IPR007634":1,"IPR:IPR038709":1}}
2 {"num":{"all":2,"EC":2,"GO":2,"IPR":2},"data":{"GO:0004477":2,"GO:0004488":2,"GO:0000105":2,"GO:0009086":2,"GO:0006164":2,"GO:0035999":2,"EC:1.5.1.5":2,"EC:3.5.4.9":2,"IPR:IPR046346":2,"IPR:IPR036291":2,"IPR:IPR000672":2,"IPR:IPR020630":2,"IPR:IPR020867":2,"IPR:IPR020631":2}}
3 {"num":{"all":1,"EC":1,"GO":1,"IPR":1},"data":{"GO:0005737":1,"GO:1990904":1,"GO:0005840":1,"GO:0003735":1,"GO:0006412":1,"EC:":1,"IPR:IPR000307":1,"IPR:IPR020592":1,"IPR:IPR023803":1}}
- peptide_id: The ID of the peptide that this aggregation of functional annotations is associated with.
- functional_annotations: A JSON-object that describes all functional annotations that this peptide is associated with, including a count value. This count reports how many proteins (in which the original peptide occurs) are associated with the specific function.
Every line in this file corresponds to a mapping between a peptide and a JSON-object containing all functional annotations associated with this peptide (assuming that the amino acids I and L are not equal).
1 {"num":{"all":17,"EC":17,"GO":17,"IPR":17},"data":{"GO:0004477":14,"GO:0004488":14,"GO:0000105":14,"GO:0009086":14,"GO:0006164":14,"GO:0035999":14,"EC:1.5.1.5":14,"EC:3.5.4.9":14,"IPR:IPR046346":14,"IPR:IPR036291":14,"IPR:IPR000672":14,"IPR:IPR020630":14,"IPR:IPR020867":14,"IPR:IPR020631":14,"GO:0033201":1,"GO:0004373":1,"GO:0009011":1,"GO:0005978":1,"EC:2.4.1.21":1,"IPR:IPR001296":1,"IPR:IPR011835":1,"IPR:IPR013534":1,"GO:0044219":1,"GO:0039617":1,"GO:0003677":2,"GO:0005525":1,"GO:0003723":1,"GO:0005198":1,"GO:0046740":1,"EC:":2,"IPR:IPR003181":1,"IPR:IPR003182":1,"IPR:IPR029053":1,"GO:0000428":1,"GO:0001216":1,"GO:0016779":1,"GO:0016987":1,"GO:0006352":1,"GO:0009399":1,"IPR:IPR000394":1,"IPR:IPR007046":1,"IPR:IPR007634":1,"IPR:IPR038709":1}}
2 {"num":{"all":2,"EC":2,"GO":2,"IPR":2},"data":{"GO:0004477":2,"GO:0004488":2,"GO:0000105":2,"GO:0009086":2,"GO:0006164":2,"GO:0035999":2,"EC:1.5.1.5":2,"EC:3.5.4.9":2,"IPR:IPR046346":2,"IPR:IPR036291":2,"IPR:IPR000672":2,"IPR:IPR020630":2,"IPR:IPR020867":2,"IPR:IPR020631":2}}
- peptide_id: The ID of the peptide that this aggregation of functional annotations is associated with.
- functional_annotations: A JSON-object that describes all functional annotations that this peptide is associated with, including a count value. This count reports how many proteins (in which the original peptide occurs) are associated with the specific function.
Every line in this file corresponds to a mapping between a peptide and its LCA. All taxa are aggregated of the proteins in which each peptide occurs. Afterwards, the lowest common ancestor of these taxa is calculated and linked to this peptide. It is assumed that the amino acids I and L are equal for the peptides in this file.
1 1
2 87882
3 272568
5 7227
6 9606
7 502779
- peptide_id: The ID of the peptide that this LCA is associated with.
- lca: The NCBI taxon ID for the lowest common ancestor of this peptide.
Every line in this file corresponds to a mapping between a peptide and its LCA. All taxa are aggregated of the proteins in which each peptide occurs. Afterwards, the lowest common ancestor of these taxa is calculated and linked to this peptide. It is assumed that the amino acids I and L are not equal for the peptides in this file.
1 1
2 87882
3 272568
4 7227
6 9606
- peptide_id: The ID of the peptide that this LCA is associated with.
- lca: The NCBI taxon ID for the lowest common ancestor of this peptide.
Each line in this file corresponds to a mapping between a tryptic peptide sequence and the ID of the UniProt entry that originated from. One peptide is typically linked to multiple UniProt entries (since a tryptic peptide is typically found in more than one protein). This file assumes that the amino acids I and L are equal (thus a peptide AALI will also be matched with a protein that contains AALL or AAII).
AAAAA 1063
AAAAA 243160
AAAAA 271848
AAAAA 272560
AAAAA 31716
AAAAA 320372
AAAAA 320373
- peptide_sequence: The tryptic peptide sequence as previously digested from a protein.
- uniprot_entry_id: ID of the UniProt entry from which the tryptic peptide sequence originated.
Each line in this file corresponds to a mapping between a tryptic peptide sequence and the ID of the UniProt entry that originated from. One peptide is typically linked to multiple UniProt entries (since a tryptic peptide is typically found in more than one protein). This file assumes that the amino acids I and L are not equal.
AAAAA 1063
AAAAA 243160
AAAAA 271848
AAAAA 272560
AAAAA 31716
AAAAA 320372
- peptide_sequence: The tryptic peptide sequence as previously digested from a protein.
- uniprot_entry_id: ID of the UniProt entry from which the tryptic peptide sequence originated.
This file contains all tryptic peptides that are the result of an in-silico tryptic digest of all input proteins. Both the original and equalized (in which all I's are replaced by L) sequences are present in here.
1 GGLSVPGPMGPSGPR GGISVPGPMGPSGPR 1 GO:0005615;EC:;IPR:IPR008160
2 GLPGPPGPGPQGFQGPPGEPGEPGSSGPMGPR GLPGPPGPGPQGFQGPPGEPGEPGSSGPMGPR 1 GO:0005615;EC:;IPR:IPR008160
3 GPPGPPGK GPPGPPGK 1 GO:0005615;EC:;IPR:IPR008160
4 NGDDGEAGKPGRPGER NGDDGEAGKPGRPGER 1 GO:0005615;EC:;IPR:IPR008160
5 GPPGPQGAR GPPGPQGAR 1 GO:0005615;EC:;IPR:IPR008160
- id: A temporary ID used to identify each of these lines.
- equalized_peptide_sequence: A version of the peptide sequence in which all I's are replaced by L.
- original_peptide_sequence: The original tryptic peptide sequence (no replacements have been made here).
- uniprot_entry_id: ID of the UniProt entry from which this degisted tryptic peptide sequence originates.
- functional_annotations: A list of all functional annotations of the protein from which this tryptic peptide originates (this is thus restricted to one protein here!).
11161754 1 AAAAA 470096 GO:0004477;GO:0004488;GO:0000105;GO:0009086;GO:0006164;GO:0035999;EC:1.5.1.5;EC:3.5.4.9;IPR:IPR046346;IPR:IPR036291;IPR:IPR000672;IPR:IPR020630;IPR:IPR020867;IPR:IPR020631
11161773 1 AAAAA 470097 GO:0004477;GO:0004488;GO:0000105;GO:0009086;GO:0006164;GO:0035999;EC:1.5.1.5;EC:3.5.4.9;IPR:IPR046346;IPR:IPR036291;IPR:IPR000672;IPR:IPR020630;IPR:IPR020867;IPR:IPR020631
11161792 1 AAAAA 470098 GO:0004477;GO:0004488;GO:0000105;GO:0009086;GO:0006164;GO:0035999;EC:1.5.1.5;EC:3.5.4.9;IPR:IPR046346;IPR:IPR036291;IPR:IPR000672;IPR:IPR020630;IPR:IPR020867;IPR:IPR020631
11161811 1 AAAAA 470099 GO:0004477;GO:0004488;GO:0000105;GO:0009086;GO:0006164;GO:0035999;EC:1.5.1.5;EC:3.5.4.9;IPR:IPR046346;IPR:IPR036291;IPR:IPR000672;IPR:IPR020630;IPR:IPR020867;IPR:IPR020631
- id: A temporary ID used to identify each of these lines.
-
equalized_sequence_id: ID of this peptide sequence (where I and L are considered to be equal), as used in the
sequences.tsv.gz
file. - original_sequence: Original tryptic peptide sequence (where I and L are considered to be different).
- uniprot_entry_id: ID of the UniProt entry from which this peptide originates.
- functional_annotations: List of functional annotations associated with this peptide, for this UniProt entry.
11161754 1 1 470096 GO:0004477;GO:0004488;GO:0000105;GO:0009086;GO:0006164;GO:0035999;EC:1.5.1.5;EC:3.5.4.9;IPR:IPR046346;IPR:IPR036291;IPR:IPR000672;IPR:IPR020630;IPR:IPR020867;IPR:IPR020631
11161773 1 1 470097 GO:0004477;GO:0004488;GO:0000105;GO:0009086;GO:0006164;GO:0035999;EC:1.5.1.5;EC:3.5.4.9;IPR:IPR046346;IPR:IPR036291;IPR:IPR000672;IPR:IPR020630;IPR:IPR020867;IPR:IPR020631
11161792 1 1 470098 GO:0004477;GO:0004488;GO:0000105;GO:0009086;GO:0006164;GO:0035999;EC:1.5.1.5;EC:3.5.4.9;IPR:IPR046346;IPR:IPR036291;IPR:IPR000672;IPR:IPR020630;IPR:IPR020867;IPR:IPR020631
11161811 1 1 470099 GO:0004477;GO:0004488;GO:0000105;GO:0009086;GO:0006164;GO:0035999;EC:1.5.1.5;EC:3.5.4.9;IPR:IPR046346;IPR:IPR036291;IPR:IPR000672;IPR:IPR020630;IPR:IPR020867;IPR:IPR020631
- id: A temporary ID used to identify each of these lines.
-
equalized_sequence_id: ID of this peptide sequence (where I and L are considered to be equal), as used in the
sequences.tsv.gz
file. -
original_sequence_id: ID of this peptide sequence (where I and L are considered not to be equal), as used in the
sequences.tsv.gz
file. - uniprot_entry_id: ID of the UniProt entry from which this peptide originates.
- functional_annotations: List of functional annotations associated with this peptide, for this UniProt entry.
This file contains a list of all tryptic peptides that are the result of in-silico tryptic digest of the provided input databases. A unique identifier is generated for each of these sequences and will be referred to by other files.
1 AAAAA
2 AAAAAA
3 AAAAAAAAA
4 AAAAAAAAAAAAAAAAAAAAQAQATSSYPSAISPGSK
5 AAAAAAAAAAAAAAAAAAAAQAQATSSYPSALSPGSK
6 AAAAAAAAAAAAAAAAGATCLER
7 AAAAAAAAAAAAAAAAGVGGMGELGVNGEK
- sequence_id: A unique identifier for this tryptic peptide sequence.
- peptide_sequence: The tryptic peptide sequence itself.