Skip to content

Building the suffix array

Tibo Vande Moortele edited this page Jun 24, 2024 · 18 revisions

Build using the CLI

Build using Docker

Build using the HPC

Index storage setup

Important

This section needs to be executed on a Unipept server!

Navigate to the data share

cd /mnt/data

Create the folder structure for the new index version

sudo mkdir -p uniprot-2024-03/{index,suffix-array,tables}

Set the right permissions

sudo chmod -R 777 uniprot-2024-03

Creating the input files

Important

This section needs to be executed on a Unipept server!

Clone the unipept-database repository

git clone https://github.com/unipept/unipept-database

Build all the binaries:

./unipept-database/scripts/build_binaries.sh

Run the build_database script:

sudo ./unipept-database/scripts/build_database.sh -i /mnt/data/uniprot-2024-03/index -d /mnt/data/tmp -m 2g suffix-array-index swissprot,trembl https://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz,https://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.dat.gz /mnt/data/uniprot-2024-03/tables

Moving the input files to the HPC VO

Important

This section needs to be executed on a Unipept server!

scp /mnt/data/uniprot-2024-03/tables/uniprot_entries.tsv.lz4 hpc-tibo:/kyukon/data/gent/vo/000/gvo00038/suffix-array
scp /mnt/data/uniprot-2024-03/tables/taxons.tsv.lz4 hpc-tibo:/kyukon/data/gent/vo/000/gvo00038/suffix-array

Preparing the input files on the HPC

Navigate to the virtual organisation (VO) where the files are stored

cd /kyukon/data/gent/vo/000/gvo00038/suffix-array/

Load the required modules on the HPC

module load lz4/1.9.4-GCCcore-12.3.0

Decompress and extract the required data for the Suffix Array Builder

lz4cat uniprot_entries.tsv.lz4 | cut -f2,4,7,8 > proteins.tsv
lz4cat taxons.tsv.lz4 > taxons.tsv

Running the PBS job

Clone the unipept-index repository

git clone https://github.com/unipept/unipept-index

Go to the root of the repository and update the submodules

cd unipept-index
git submodule update --init --recursive

Swap to the high-memory gallade cluster

module swap cluster/gallade

Submit the PBS script to start the process

VSC_DATA_VO=/kyukon/data/gent/vo/000/gvo00038 qsub sa-builder/build.pbs

VSC_DATA_VO has to contain the path to the virtual organisation.

Troubleshooting

Error: attribute name space is experimental

error[E0658]: `#[diagnostic]` attribute name space is experimental
   --> /user/gent/437/vsc43736/.cargo/registry/src/index.crates.io-6f17d22bba15001f/axum-0.7.5/src/handler/mod.rs:130:5
    |
130 |     diagnostic::on_unimplemented(
    |     ^^^^^^^^^^
    |
    = note: see issue #111996 <https://github.com/rust-lang/rust/issues/111996> for more information
    = help: add `#![feature(diagnostic_namespace)]` to the crate attributes to enable

For more information about this error, try `rustc --explain E0658`.
error: could not compile `axum` (lib) due to previous error

Solution: Downgrade the version of the package to a working version

Clone this wiki locally