Skip to content

Commit

Permalink
updated faq note; training faq section changes
Browse files Browse the repository at this point in the history
  • Loading branch information
Lilferrit committed Oct 29, 2024
1 parent f7e66db commit 3042ef7
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 7 deletions.
12 changes: 6 additions & 6 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,18 +103,18 @@ Training, validation, and test splits for the non-enzymatic dataset are availabl

**How do I know which model to use after training Casanovo?**

When running model validation, Casanovo will use the validation data to compute performance measures (training loss, validation loss, amino acid precision, and peptide precision) and print this information to the console and log file.
At the end of each validation run and at the end of each training epoch (one complete run over the training data), Casanovo will take a snapshot of the current model weights.
After the training job is finished, the validation snapshot that achieved the lowest **validation loss** will be saved to the output directory as `<output_root>.best.ckpt`.
Additionally, a snapshot of the model weights at the end of each **training** epoch will be saved to the output directory as `epoch=<epoch>-step=<step>.ckpt`.
Snapshots from previous training epochs will be overwritten with the latest training snapshot at the end of each training epoch.

By default, Casanovo runs model validation every 50,000 training steps.
Note that the number of samples that are processed during a single training step depends on the batch size.
Therefore, the default training batch size of 32 corresponds to saving a model snapshot after every 1.6 million training samples.
You can optionally modify the validation run frequency in the [config file](https://github.com/Noble-Lab/casanovo/blob/main/casanovo/config.yaml) (parameter `val_check_interval`), depending on your dataset size.
Note that running model validation very frequently will result in slower training time because Casanovo will evaluate its performance on the validation data for every validation check.

When running model validation, Casanovo will use the validation data to compute performance measures (training loss, validation loss, amino acid precision, and peptide precision) and print this information to the console and log file.
At the end of each validation run and training epoch (one complete run over the training data), Casanovo will take a snapshot of the current model weights.
After the training job is finished, the validation snapshot that achieved the lowest **validation loss** will ba saved to the output directory as `best.ckpt` if no custom output prefix is specified.
Additionally, a snapshot of the model weights at the end of each **training** epoch will be saved to the output directory as `epoch=<epoch>-step=<step>.ckpt`.
Snapshots from previous training epochs will be overwritten with the latest training snapshot at the end of each training epoch.

**Even though I added new post-translational modifications to the configuration file, Casanovo didn't identify those peptides.**

Casanovo can only make predictions using post-translational modifications (PTMs) that were included when training the model.
Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ casanovo sequence [PATH_TO]/sample_preprocessed_spectra.mgf
```

```{note}
If you want to store the output mzTab file in a different location than the current working directory, specify an alternative output location using the `--output` parameter.
If you want to store the output mzTab file in a different location than the current working directory, specify an alternative output location using the `--output_dir` parameter.
```

This job should complete in < 1 minute.
Expand Down

0 comments on commit 3042ef7

Please sign in to comment.