-
Notifications
You must be signed in to change notification settings - Fork 0
eifunannot configure
We need to configure the input before we run the pipeline
Run eifunannot configure to generate config files
/path/to/workdir
$ source eifunannot-1.4.0
$ eifunannot configure
...
...
Running eifunannot configure..
Copying file cluster_config.json to /path/to/workdir
Copying file run_config.yaml to /path/to/workdir
Copying file ahrd_example_input_go_prediction.generic.yaml to /path/to/workdir
Configure complete.
You might need to configure the above files based on your input
❗ IMPORTANT NOTE:
The default location of the config files points to the
/ei/cb/..
area and access to/ei/cb/..
is limited.
Please use thesed
command below to update therun_config.yaml
to use the/ei/public/..
location./path/to/workdir $ sed -i.bkp -e 's:/ei/cb/common/Databases/ahrd/3.3.3/src/AHRD-3.3.3:/ei/public/databases/eifunannot/AHRD-3.3.3:g' -e 's:/ei/cb/common/Databases/ahrd/3.3.3/27Nov2018:/ei/public/databases/eifunannot/reference/27Nov2018:g' run_config.yamlThe above
sed
command will create a backup file with the original configuration (run_config.yaml.bkp
) and a new file (run_config.yaml
) which you can use downstream.
Please continue with the below steps.
The main file you need to configure is the run_config.yaml
, where:
-
fasta
: provide path to protein fasta file -
databases
:-
reference
: provide path to curated protein fasta file. You can use the TAIR protein fasta file -/ei/public/databases/eifunannot/reference/TAIR10_pep_20101214_updated.fasta
. The fasta header of this file is already configured to run with AHRD, which is used within the pipeline. -
swissprot
: provide path to the downloaded Uniprot SwissProt protein database -
trembl
: provide path to the downloaded Uniprot TrEMBL protein database
-
###########################################################
# Input parameters required to drive AHRD snakemake suite
###########################################################
# provide path to PROTEIN fasta file
fasta: /ei/cb/common/Scripts/eifunannot/0.2/tests/test.protein.fa
# output folder name, NOT path
output: ./output
# number of protein to process in a chunk
chunk_size: 500
# provide protein databases
## CONFIGURATION ##
# below reference protein header is formatted to have the functional description parsable by AHRD config file (ahrd_config)
# So, when using this pipeline, please change the reference protein functional description line
# if no reference is available please remove the 'reference:' line under 'databases:' from this run_config.yaml file and also remove the 'tair:' section (lines 28 to 37) under 'blast_dbs:' from the AHRD config file - ahrd_example_input_go_prediction.generic.yaml
databases:
reference: /ei/cb/common/References/Protein/Uniprot/30Jan2019/TAIR10_pep_20101214_updated.fasta
swissprot: /ei/cb/common/References/Protein/Uniprot/30Jan2019/UniProt_swissprot_Viridiplantae_33090_40216_2019_12_11.fasta
trembl: /ei/cb/common/References/Protein/Uniprot/30Jan2019/UniProt_trembl_Viridiplantae_33090_9314135_2019_12_11.fasta
## END CONFIGURATION ##
This is the default AHRD config file used in eifunannot. You can configure it to your needs.
This is the HPC cluster configuration file to use. If, in case, you need to exclude certain HPC hosts or increase memory of certain rules, you would need to update this file.
$ eifunannot configure --help
usage: eifunannot [-h] [-o OUTPUT] [-f]
EI FunAnnot version 1.4.0 - configure
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Output directory (default: /path/to/workdir)
-f, --force Force overwrite if configuration files exist (default: False)
Example configure command:
eifunannot configure
Contact:Gemy Kaithakottil (kaithakg)([email protected])