Skip to content

5. Evaluation

Abdurrahman Abul-Basher edited this page Jun 6, 2021 · 23 revisions

Overview

leADS can be evaluated using a pre-trained model (see Training). A pre-trained model, ("leADS.pkl") trained on Enzyme Commission (EC) number indices with embedding (biocyc21_Xe.pkl) and the pathway indices (biocyc21_y.pkl) data is made available to users in the Download files section of this wiki.

Note: Make sure to put the source code leADS/ (Installing leADS) into the leADS_materials/ directory as explained in the Download files section. Additionally, create a log/ and result/ (if you have not already created one during pathway prediction) folder in the same leADS_materials/ directory. The final structure should look like this:

leADS_materials/
	├── objectset/
        │       └── ...
	├── model/
        │       └── ...
	├── dataset/
        │       └── ...
	├── result/
        │       └── ...
	└── leADS/
                └── ...

For all experiments, using a terminal (On Linux and macOS) or an Anaconda command prompt (On Windows) navigate to the src/ folder in the leADS/ directory and then run the commands as shown in the Examples section.

To display leADS' running options use: python main.py --help. It should be self-contained.

Input:

Two matrix files namely [DATANAME]_X*.pkl and the [DATANAME]_y.pkl must be provided for evaluation of a leADS model.

Note: Data files such as "[DATANAME]_Xe.pkl", "[DATANAME]_Xa.pkl", "[DATANAME]_X.pkl" can be used for evaluation, provided leADS was trained using these corresponding files.

Command:

The basic command is represented below. Do not use this to run the evaluation step. This command is only a representation of all the flags used. See the Examples section below on how to run Evaluation.

python main.py \
--evaluate \
--pred-labels \
--soft-voting \
--X-name "[DATANAME]_X*.pkl" \
--y-name "[DATANAME]_y.pkl" \
--file-name "[save file name]" \
--dspath "[absolute path to the dataset directory (e.g. dataset)]" \
--rspath "[absolute path to the result directory (e.g. result)]" \
--batch 50 \
--num-jobs 2

Argument descriptions:

The table below summarizes all the command-line arguments that are specific to this framework:

Argument name Description Value
--evaluate To evaluate the performance of leADS on the input dataset False
--pred-labels Predicting labels in input False
--soft-voting Boolean variable indicating whether to predict labels based on the calibrated sums of the predicted probabilities from an ensemble False
--X-name The input file name corresponding to EC number indices or any other feature files (see Advancedusage) [DATANAME]_X*.pkl
--y-name The input file name corresponding to pathway indices [DATANAME]_y.pkl
--file-name The names of input preprocessed files (without extension) [input (or save) file name]
--dspath The path to the datasets Outside source code
--rspath The path to store results Outside source code
--batch Batch size 50
--num-jobs The number of parallel workers 2

Output:

The output file generated after running the command is:

File Description
[DATANAME]_scores.txt A text file containing model performance scores for all samples used

Examples

Example 1:

To evaluate the performance of leADS on the golden dataset (golden_Xe.pkl and golden_y.pkl), run the following command:

Note: The flag --dsname must include the name of the dataset which is "golden" in this case.

python main.py --evaluate --pred-labels --soft-voting --X-name "golden_Xe.pkl" --y-name "golden_y.pkl" --dsname "golden" --file-name "leADS_golden" --model-name "leADS" --num-jobs 2

After running the command, the output will be saved to the result/ folder. A short description of the output is given in the table above. The tree structure for the folder with the output will look like this:

leADS_materials/
	├── objectset/
        │       └── ...
	├── model/
        │       ├── leADS.pkl
        │       └── ...
	├── dataset/
        │       └── ...
	├── result/
        |       ├── leADS_golden_scores.txt
        │       └── ...
	└── leADS/
                └── ...

Example 2:

To evaluate the performance of leADS on the cami dataset (cami_Xe.pkl and cami_y.pkl), run the following command:

Note: The flag --dsname must include the name of the dataset which is "cami" in this case.

python main.py --evaluate --pred-labels --soft-voting --X-name "cami_Xe.pkl" --y-name "cami_y.pkl" --dsname "cami" --file-name "leADS_cami" --model-name "leADS" --num-jobs 2

After running the command, the output will be saved to the result/ folder. A short description of the output is given in the table above. The tree structure for the folder with the output will look like this:

leADS_materials/
	├── objectset/
        │       └── ...
	├── model/
        │       ├── leADS.pkl 
        │       └── ...
	├── dataset/
        │       └── ...
	├── result/
        |       ├── leADS_cami_scores.txt
        │       └── ...
	└── leADS/
                └── ...
Clone this wiki locally