-
Notifications
You must be signed in to change notification settings - Fork 2
Unipept Database Index
Pieter Verschaffelt edited this page Mar 21, 2023
·
2 revisions
During the construction of a targeted protein reference database, the database construction process needs to extract only those proteins that are associated with a specific list of organisms. In order to be able to efficiently extract only those proteins that are associated with the requested taxa, we have designed the Unipept Database Index.
A Unipept Database Index is a folder containing a set of zipped TSV-files, later referred to as chunks. The proteins in these chunks are sorted by taxon ID number and split into approximately equally large files.
Every chunk follows the same structure and contains lines of the format in the following example:
P19871 MTNRLQGKVALVTGGASGVGLEVVKLLLGEGAKVAFSDINEAAGQQLAAELGERSMFVRHDVSSEADWTLVMAAVQRRLGTLNVLVNNAGILLPGDMETGRLEDFSRLLKINTESVFIGCQQGIAAMKETGGSIINMASVSSWLPIEQYAGYSASKAAVSALTRAAALSCRKQGYAIRVNSIHPDGIYTPMMQASLPKGVSKEMVLHDPKLNRAGRAYMPERIAQLVLFLASDESSVMSGSELHADNSILGMGL 3-beta-hydroxysteroid dehydrogenase 121 1.1.1.51 GO:0035410;GO:0047045;GO:0047035;GO:0008202 IPR036291;IPR020904;IPR002347 swissprot 285
Q06191 MTINATVKEAGFRPASRISSIGVSEILKIGARAAAMKREGKPVIILGAGEPDFDTPDHVKQAASDAIHRGETKYTALDGTPELKKAIREKFQRENGLAYELDEITVATGAKQILFNAMMASLDPGDEVVIPTPYWTSYSDIVQICEGKPILIACDASSGFRLTAQKLEAAITPRTRWVLLNSPSNPSGAAYSAADYRPLLDVLLKHPHVWLLVDDMYEHIVYDAFRFVTPARLEPGLKDRTLTVNGVSKAYAMTGWRIGYAGGPRALIKAMAVVQSQATSCPSSVSQAASVAALNGPQDFLKERTESFQRRRNLVVNGLNAIEGLDCRVPEGAFYTFSGCAGVARRVTPSGKRIESDTDFCAYLLEDSHVAVVPGSAFGLSPYFRISYATSEAELKEALERISAACKRLS Aspartate aminotransferase 96 2.6.1.1 GO:0005737;GO:0004069;GO:0030170;GO:0009058 IPR004839;IPR004838;IPR015424;IPR015421;IPR015422 swissprot 382
P84887 MRWLDKFGESLSRSVAHKTSRRSVLRSVGKLMVGSAFVLPVLPVARAAGGGGSSSGADHISLNPDLANEDEVNSCDYWRHCAVDGFLCSCCGGTTTTCPPGSTPSPISWIGTCHNPHDGKDYLISYHDCCGKTACGRCQCNTQTRERPGYEFFLHNDVNWCMANENSTFHCTTSVLVGLAKN Aralkylamine dehydrogenase light chain 77 1.4.9.2 GO:0042597;GO:0030058;GO:0030059;GO:0009308 IPR016008;IPR036560;IPR013504;IPR006311 swissprot 511