Skip to content

Input files

Pieter Verschaffelt edited this page Mar 21, 2023 · 4 revisions

This page lists all the different files (and file formats) that are being used by the different steps in the database construction process of Unipept's database.

NCBI

The files in this section are all provided by NCBI and can be retrieved from their FTP-server: https://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip. The ZIP-file linked above contains more files than the ones listed below, but these are not of interested to our project.

names.dmp

This file contains all NCBI taxa and their associated name.

Example

1	|	all	|		|	synonym	|
1	|	root	|		|	scientific name	|
2	|	Bacteria	|	Bacteria <bacteria>	|	scientific name	|

nodes.dmp

This file contains all NCBI taxa and the associated ranks and lineage (as described by the NCBI taxonomy).

Example

1	|	1	|	no rank	|		|	8	|	0	|	1	|	0	|	0	|	0	|	0	|	0|		|
2	|	131567	|	superkingdom	|		|	0	|	0	|	11	|	0	|	0	|	0	|	0	|0	|		|

EC

The files in this section are provided by the Enzyme Commission and can be retrieved from the Expasy FTP-server: https://ftp.expasy.org/databases/enzyme.

enzclass.txt

Available at https://ftp.expasy.org/databases/enzyme/enzclass.txt. This file contains the EC-numbers (and their associated names) on the class, subclass and subsubclass level. The information about EC-numbers on the deepest level is available in the [enzyme.dat](#enzymedat] file.

Example

----------------------------------------------------------------------------
        ENZYME nomenclature database
        SIB Swiss Institute of Bioinformatics; Geneva, Switzerland
----------------------------------------------------------------------------

Description: Definition of enzyme classes, subclasses and sub-subclasses
Name:        enzclass.txt
Release:     22-Feb-2023

----------------------------------------------------------------------------

1. -. -.-  Oxidoreductases.
1. 1. -.-   Acting on the CH-OH group of donors.
1. 1. 1.-    With NAD(+) or NADP(+) as acceptor.
1. 1. 2.-    With a cytochrome as acceptor.
1. 1. 3.-    With oxygen as acceptor.
1. 1. 4.-    With a disulfide as acceptor.

enzyme.dat

This file is available at https://ftp.expasy.org/databases/enzyme/enzyme.dat and it contains all EC-numbers on the deepest level of the EC ontology (including their name).

Example

CC   -----------------------------------------------------------------------
CC
CC   ENZYME nomenclature database
CC
CC   -----------------------------------------------------------------------
CC   Release of 22-Feb-2023
CC   -----------------------------------------------------------------------
CC
CC   Alan Bridge and Kristian Axelsen
CC   SIB Swiss Institute of Bioinformatics
CC   Centre Medical Universitaire (CMU)
CC   1, rue Michel Servet
CC   1211 Geneva 4
CC   Switzerland
CC
CC   Email: [email protected]
CC
CC   WWW server: http://enzyme.expasy.org/
CC
CC   -----------------------------------------------------------------------
CC   Copyrighted by the SIB Swiss Institute of Bioinformatics and
CC   distributed under the Creative Commons Attribution (CC BY 4.0) License
CC   -----------------------------------------------------------------------
//
ID   1.1.1.1
DE   alcohol dehydrogenase.
AN   aldehyde reductase.
CA   (1) a primary alcohol + NAD(+) = an aldehyde + H(+) + NADH.
CA   (2) a secondary alcohol + NAD(+) = a ketone + H(+) + NADH.
CC   -!- Acts on primary or secondary alcohols or hemi-acetals with very broad
CC       specificity; however the enzyme oxidizes methanol much more poorly
CC       than ethanol.
CC   -!- The animal, but not the yeast, enzyme acts also on cyclic secondary
CC       alcohols.
DR   P07327, ADH1A_HUMAN;  P28469, ADH1A_MACMU;  Q5RBP7, ADH1A_PONAB;
DR   P25405, ADH1A_SAAHA;  P25406, ADH1B_SAAHA;  P00327, ADH1E_HORSE;
DR   P00326, ADH1G_HUMAN;  O97959, ADH1G_PAPHA;  P00328, ADH1S_HORSE;
DR   P80222, ADH1_ALLMI ;  P30350, ADH1_ANAPL ;  P49645, ADH1_APTAU ;
DR   P06525, ADH1_ARATH ;  P41747, ADH1_ASPFN ;  Q17334, ADH1_CAEEL ;
DR   P43067, ADH1_CANAX ;  P85440, ADH1_CATRO ;  P14219, ADH1_CENAM ;
Clone this wiki locally