fix documentation of --output #241

wsnoble · 2023-09-12T17:37:55Z

In the command line documentation, the --output option is only for the sequencing phase and doesn't reflect what outputs are produced during training.

│    --output               -o  FILE                    The mzTab file to      │
│                                                       which results will be  │
│                                                       written.               │

The text was updated successfully, but these errors were encountered:

wsnoble · 2023-11-06T22:49:12Z

Actually, this documentation is also inaccurate. If you use -o foo, then the file foo.mztab gets produced. So it's really asking for the root of the output filename.

I think that when we output checkpoint files, they should also include this root (if it's provided) as a prefix.

bittremieux · 2023-12-25T11:34:01Z

I think that when we output checkpoint files, they should also include this root (if it's provided) as a prefix.

I'm not sure about this. There's a separate argument to specify the checkpoint file name during training, namely --model (as well as to load an existing model file). How would you resolve these two values then?

Fixes #241.

wsnoble · 2023-12-26T18:38:22Z

Are you sure that this is the behavior? I think I usually just see output ckpt files with names like epoch=2-step=150000.ckpt.

In general, it seems like a bad idea to use the same option to specify both the input and the output filename.

I suggest that if the user provides --root foo then the above would be foo.epoch=2-step=150000.ckpt'. Currently, if the user specifies foo.ckpt`, how do the different output files get named? It seems like you'd have to have some logic to look for and strip off the ".ckpt" and then add in the epoch and step number to the name. Plus error handling if the user doesn't provide ".ckpt" at the end of the model name.

bittremieux · 2023-12-26T20:10:55Z

Good catch. The checkpoint files are actually still another factor. They're saved to the directory model_save_folder_path in the config, but their file name can't be specified currently, it's just the standard epoch and step counters.

So:

output specifies the root file name for the mzTab (depending on mode) and log (always) files.
model specifies the model used for inference, or, when training, the state from which to continue training.
During training, models are saved as checkpoints (default: 5 best). Only the directory for checkpoints can be specified, not the file name.

So what do we want to happen?

The instructions for output should clarify that this is the root file name for the mzTab and log files. Done in Update output command-line description #276.
Current model behavior seems to be correct for inference.
During training, what do we want model to signify? The output name of the newly obtained model weights? What if training resumes? How do we differentiate the initial model from the newly trained model?

Did I get this right?

wsnoble · 2023-12-26T20:29:47Z

I would say that root should specify the root of the file name for all output files, including checkpoint files.

Good point about model being overloaded for input and output models. If I were doing this from scratch, I would probably use two distinct options, like model-in and model-out. But for backward compatibility maybe we just use model for input and use the root option to specify the output model name.

wsnoble · 2024-06-26T14:56:15Z

Note that there is relevant discussion here.

Lilferrit · 2024-06-26T22:04:05Z

I added a --root_ckpt_name option to the train sub-command that will add a root name to the checkpoint files, e.g. if the option is set as --root_ckpt_name foobar than the checkpoint filenames will have the format foobar.epoch=2-step=150000.ckpt. Otherwise the checkpoint files will have the original default format. The checkpoint files will still be saved to the model_save_folder_path set in the config file.

Lilferrit · 2024-06-26T23:08:50Z

I also just added an --overwrite_output boolean flag that is false by default. If this flag isn't set then the Casanovo CLI will raise an error if one of the output files already exists, otherwise it will overwrite it. All of these changes are on the model-out branch.

wsnoble added the good first issue Good for newcomers label Sep 21, 2023

bittremieux added the documentation Improvements or additions to documentation label Oct 23, 2023

bittremieux added a commit that referenced this issue Dec 25, 2023

Update output command-line description

dc757b9

Fixes #241.

bittremieux mentioned this issue Dec 25, 2023

Update output command-line description #276

Closed

bittremieux added this to the Casanovo v5.0.0 milestone May 14, 2024

wsnoble assigned Lilferrit Jun 26, 2024

Lilferrit linked a pull request Jun 27, 2024 that will close this issue

Fix documentation of --output and updated output settings. #348

Closed

Lilferrit mentioned this issue Jun 27, 2024

Fix documentation of --output and updated output settings. #348

Closed

Lilferrit linked a pull request Sep 17, 2024 that will close this issue

File IO command line options revision #372

Merged

Lilferrit closed this as completed Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix documentation of --output #241

fix documentation of --output #241

wsnoble commented Sep 12, 2023

wsnoble commented Nov 6, 2023

bittremieux commented Dec 25, 2023 •

edited

Loading

wsnoble commented Dec 26, 2023

bittremieux commented Dec 26, 2023

wsnoble commented Dec 26, 2023

wsnoble commented Jun 26, 2024

Lilferrit commented Jun 26, 2024

Lilferrit commented Jun 26, 2024 •

edited

Loading

fix documentation of --output #241

fix documentation of --output #241

Comments

wsnoble commented Sep 12, 2023

wsnoble commented Nov 6, 2023

bittremieux commented Dec 25, 2023 • edited Loading

wsnoble commented Dec 26, 2023

bittremieux commented Dec 26, 2023

wsnoble commented Dec 26, 2023

wsnoble commented Jun 26, 2024

Lilferrit commented Jun 26, 2024

Lilferrit commented Jun 26, 2024 • edited Loading

bittremieux commented Dec 25, 2023 •

edited

Loading

Lilferrit commented Jun 26, 2024 •

edited

Loading