-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Welcome to the Unipept Database wiki. This repository contains all code that orchestrates the construction and structure of the Unipept Database, a peptide-centric database derived from the UniProtKB-resource which ultimately powers the Unipept metaproteomics analysis platform (see https://unipept.ugent.be).
The construction of the Unipept Database is performed by invoking the build_database.sh
script.
This script resides in the scripts
folder of this repository and can be started from the command line on a server, on your local machine or in a Docker container.
Example implementations for such a Docker container can be found in this repository.
On completion, the build_database.sh
script will produce a list of compressed TSV-files that contain all data that subsequently needs to be fed into a relational database management system (such as MySQL or PostgreSQL).
All information on this wiki serves as an extensive reference for the output format of each of these files, and how the database construction process works internally.
The construction process for the Unipept Database requires and produces a lot of different files that can be categorized in two different categories:
Since build_database.sh
is a shell-script that has a very complex task to adhere to, we have developed a list of helper scripts (either written in Java or JavaScript) that are invoked by the main script and that each have a very specific function.
Below you can find a list of all of the helper scripts (that reside in the scripts/helper_scripts
folder) and what input they require and what output they produce.
- LineagesSequencesTaxons2LCAs.jar
- NamesNodes2TaxonsLineages.jar
- TaxonsUniprots2Tables.jar
- XmlToTabConverter.jar
- FunctionalAnalysisPeptides.js
- TaxaByChunk.js
- WriteToChunk.js
- filter_taxa.sh
The most important script in this repository, build_database.sh
which orchestrates the complete database construction process, consists of a series of complex steps which are all described in detail below.