Skip to content
Pieter Verschaffelt edited this page Mar 20, 2023 · 16 revisions

Welcome to the Unipept Database wiki. This repository contains all code that orchestrates the construction and structure of the Unipept Database, a peptide-centric database derived from the UniProtKB-resource which ultimately powers the Unipept metaproteomics analysis platform (see https://unipept.ugent.be).

Database construction

The construction of the Unipept Database is performed by invoking the build_database.sh script. This script resides in the scripts folder of this repository and can be started from the command line on a server, on your local machine or in a Docker container. Example implementations for such a Docker container can be found in this repository.

On completion, the build_database.sh script will produce a list of compressed TSV-files that contain all data that subsequently needs to be fed into a relational database management system (such as MySQL or PostgreSQL). All information on this wiki serves as an extensive reference for the output format of each of these files, and how the database construction process works internally.

File formats

The construction process for the Unipept Database requires and produces a lot of different files that can be categorized in two different categories:

Helper scripts

Since build_database.sh is a shell-script that has a very complex task to adhere to, we have developed a list of helper scripts (either written in Java or JavaScript) that are invoked by the main script and that each have a very specific function. Below you can find a list of all of the helper scripts (that reside in the scripts/helper_scripts folder) and what input they require and what output they produce.

Overview of build_database.sh

The most important script in this repository, build_database.sh which orchestrates the complete database construction process, consists of a series of complex steps which are all described in detail below.

Clone this wiki locally