Skip to content

eifunannot configure

Gemy George Kaithakottil edited this page Jun 19, 2023 · 2 revisions

We need to configure the input before we run the pipeline

Configure

Run eifunannot configure to generate config files

/path/to/workdir
$ source eifunannot-1.4.0
$ eifunannot configure
...
...
Running eifunannot configure..
Copying file cluster_config.json to /path/to/workdir
Copying file run_config.yaml to /path/to/workdir
Copying file ahrd_example_input_go_prediction.generic.yaml to /path/to/workdir
Configure complete.

You might need to configure the above files based on your input

IMPORTANT NOTE:

Configure run_config.yaml paths to use the /ei/public/.. location

The default location of the config files points to the /ei/cb/.. area and access to /ei/cb/.. is limited.
Please use the sed command below to update the run_config.yaml to use the /ei/public/.. location.

/path/to/workdir
$ sed -i.bkp -e 's:/ei/cb/common/Databases/ahrd/3.3.3/src/AHRD-3.3.3:/ei/public/databases/eifunannot/AHRD-3.3.3:g' -e 's:/ei/cb/common/Databases/ahrd/3.3.3/27Nov2018:/ei/public/databases/eifunannot/reference/27Nov2018:g' run_config.yaml

The above sed command will create a backup file with the original configuration (run_config.yaml.bkp) and a new file (run_config.yaml) which you can use downstream.
Please continue with the below steps.

- run_config.yaml

The main file you need to configure is the run_config.yaml, where:

  • fasta: provide path to protein fasta file
  • databases:
    • reference: provide path to curated protein fasta file. You can use the TAIR protein fasta file - /ei/public/databases/eifunannot/reference/TAIR10_pep_20101214_updated.fasta. The fasta header of this file is already configured to run with AHRD, which is used within the pipeline.
    • swissprot: provide path to the downloaded Uniprot SwissProt protein database
    • trembl: provide path to the downloaded Uniprot TrEMBL protein database
###########################################################
# Input parameters required to drive AHRD snakemake suite
###########################################################

# provide path to PROTEIN fasta file
fasta: /ei/cb/common/Scripts/eifunannot/0.2/tests/test.protein.fa

# output folder name, NOT path
output: ./output

# number of protein to process in a chunk
chunk_size: 500

# provide protein databases
## CONFIGURATION ##
# below reference protein header is formatted to have the functional description parsable by AHRD config file (ahrd_config)
# So, when using this pipeline, please change the reference protein functional description line

# if no reference is available please remove the 'reference:' line under 'databases:' from this run_config.yaml file and also remove the 'tair:' section (lines 28 to 37) under 'blast_dbs:' from the AHRD config file - ahrd_example_input_go_prediction.generic.yaml
databases:
    reference: /ei/cb/common/References/Protein/Uniprot/30Jan2019/TAIR10_pep_20101214_updated.fasta
    swissprot: /ei/cb/common/References/Protein/Uniprot/30Jan2019/UniProt_swissprot_Viridiplantae_33090_40216_2019_12_11.fasta
    trembl: /ei/cb/common/References/Protein/Uniprot/30Jan2019/UniProt_trembl_Viridiplantae_33090_9314135_2019_12_11.fasta
## END CONFIGURATION ##

- ahrd_example_input_go_prediction.generic.yaml

This is the default AHRD config file used in eifunannot. You can configure it to your needs.

- cluster_config.json

This is the HPC cluster configuration file to use. If, in case, you need to exclude certain HPC hosts or increase memory of certain rules, you would need to update this file.

Usage eifunannot configure

$ eifunannot configure --help
usage: eifunannot [-h] [-o OUTPUT] [-f]

EI FunAnnot version 1.4.0 - configure

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output directory (default: /path/to/workdir)
  -f, --force           Force overwrite if configuration files exist (default: False)

Example configure command:
eifunannot configure

Contact:Gemy Kaithakottil (kaithakg)([email protected])