Skip to content

Unipept Database Index

Pieter Verschaffelt edited this page Mar 21, 2023 · 2 revisions

During the construction of a targeted protein reference database, the database construction process needs to extract only those proteins that are associated with a specific list of organisms. In order to be able to efficiently extract only those proteins that are associated with the requested taxa, we have designed the Unipept Database Index.

A Unipept Database Index is a folder containing a set of zipped TSV-files, later referred to as chunks. The proteins in these chunks are sorted by taxon ID number and split into approximately equally large files.

Chunk structure

Every chunk follows the same structure and contains lines of the format in the following example:

P19871  MTNRLQGKVALVTGGASGVGLEVVKLLLGEGAKVAFSDINEAAGQQLAAELGERSMFVRHDVSSEADWTLVMAAVQRRLGTLNVLVNNAGILLPGDMETGRLEDFSRLLKINTESVFIGCQQGIAAMKETGGSIINMASVSSWLPIEQYAGYSASKAAVSALTRAAALSCRKQGYAIRVNSIHPDGIYTPMMQASLPKGVSKEMVLHDPKLNRAGRAYMPERIAQLVLFLASDESSVMSGSELHADNSILGMGL   3-beta-hydroxysteroid dehydrogenase     121      1.1.1.51        GO:0035410;GO:0047045;GO:0047035;GO:0008202     IPR036291;IPR020904;IPR002347   swissprot       285
Q06191  MTINATVKEAGFRPASRISSIGVSEILKIGARAAAMKREGKPVIILGAGEPDFDTPDHVKQAASDAIHRGETKYTALDGTPELKKAIREKFQRENGLAYELDEITVATGAKQILFNAMMASLDPGDEVVIPTPYWTSYSDIVQICEGKPILIACDASSGFRLTAQKLEAAITPRTRWVLLNSPSNPSGAAYSAADYRPLLDVLLKHPHVWLLVDDMYEHIVYDAFRFVTPARLEPGLKDRTLTVNGVSKAYAMTGWRIGYAGGPRALIKAMAVVQSQATSCPSSVSQAASVAALNGPQDFLKERTESFQRRRNLVVNGLNAIEGLDCRVPEGAFYTFSGCAGVARRVTPSGKRIESDTDFCAYLLEDSHVAVVPGSAFGLSPYFRISYATSEAELKEALERISAACKRLS        Aspartate aminotransferase      96       2.6.1.1 GO:0005737;GO:0004069;GO:0030170;GO:0009058     IPR004839;IPR004838;IPR015424;IPR015421;IPR015422       swissprot       382
P84887  MRWLDKFGESLSRSVAHKTSRRSVLRSVGKLMVGSAFVLPVLPVARAAGGGGSSSGADHISLNPDLANEDEVNSCDYWRHCAVDGFLCSCCGGTTTTCPPGSTPSPISWIGTCHNPHDGKDYLISYHDCCGKTACGRCQCNTQTRERPGYEFFLHNDVNWCMANENSTFHCTTSVLVGLAKN   Aralkylamine dehydrogenase light chain  77      1.4.9.2 GO:0042597;GO:0030058;GO:0030059;GO:0009308     IPR016008;IPR036560;IPR013504;IPR006311  swissprot       511
Clone this wiki locally