This document details some changes introduced in OpenNMT-tf 2.0 and actions required by the user.
Python 3.5 (or above) is now required by OpenNMT-tf. See python3statement.org for more context about this decision.
OpenNMT-tf has been completely redesigned for TensorFlow 2.0 which is now the minimal required TensorFlow version.
The correct TensorFlow version is declared as a dependency of OpenNMT-tf and will be automatically installed as part of the pip
installation:
pip install --upgrade pip
pip install OpenNMT-tf
TensorFlow 2.0 introduced a new way to save checkpoints: variables are no longer matched by their name but by where they are stored relative to the root model object. Consequently, OpenNMT-tf V1 checkpoints are no longer compatible without conversion.
To smooth this transition, V1 checkpoints of the following models are automatically upgraded on load:
- NMTBigV1
- NMTMediumV1
- NMTSmallV1
- Transformer
- TransformerBig
The command line parser has been improved to better manage task specifc options that are now located after the run type:
onmt-main <general options> train <train options>
Also, some options have changed:
- the run type
train_and_eval
has been replaced bytrain --with_eval
- the main script now includes the
average_checkpoints
andupdate_vocab
tasks - distributed training options are currently missing in 2.0
--session_config
has been removed as no longer applicable in TensorFlow 2.0
See onmt-main -h
for more details.
In OpenNMT-tf V1, models were responsible to declare the name of the vocabulary to look for, e.g.:
# V1 model inputter.
source_inputter=onmt.inputters.WordEmbedder(
vocabulary_file_key="source_words_vocabulary",
embedding_size=512)
meant that the user should configure the vocabulary like this:
# V1 vocabulary configuration.
data:
source_words_vocabulary: src_vocab.txt
This is no longer the case in V2 where vocabulary configuration now follows a general pattern, the same that is currently used for "embedding" and "tokenization" configurations.
- Single vocabulary (e.g. language model):
data:
vocabulary: vocab.txt
- Source and target vocabularies (e.g. sequence to sequence, tagging, etc.):
data:
source_vocabulary: src_vocab.txt
target_vocabulary: tgt_vocab.txt
- Multi-source and target vocabularies:
data:
source_1_vocabulary: src_1_vocab.txt
source_2_vocabulary: src_2_vocab.txt
- Nested inputs:
data:
source_1_1_vocabulary: src_1_1_vocab.txt
source_1_2_vocabulary: src_1_2_vocab.txt
source_2_vocabulary: src_2_vocab.txt
Predefined models do not require a model definition file and can be directly set to the --model_type
command line argument. Some of them have been changed for clarity:
V1 | V2 | Comment |
---|---|---|
NMTBig |
NMTBigV1 |
|
NMTMedium |
NMTMediumV1 |
|
NMTSmall |
NMTSmallV1 |
|
SeqTagger |
LstmCnnCrfTagger |
|
TransformerAAN |
Not considered useful compared to the standard Transformer | |
TransformerBigFP16 |
Use TransformerBig with --mixed_precision flag on the command line |
|
TransformerFP16 |
Use Transformer with --mixed_precision flag on the command line |
Some parameters in the YAML configuration have been renamed or removed:
V1 | V2 | Comment |
---|---|---|
*/bucket_width |
*/length_bucket_width |
|
*/num_threads |
Automatic value | |
*/prefetch_buffer_size |
Automatic value | |
eval/eval_delay |
eval/steps |
Use steps instead of seconds to set the evaluation frequency |
eval/exporters |
Not implemented | |
params/clip_gradients |
Set clipnorm or clipvalue in params/optimizer_params/ |
|
params/freeze_variables |
params/freeze_layers |
Use layer names instead of variable regexps |
params/gradients_accum |
Use train/effective_batch_size instead |
|
params/horovod |
Not implemented | |
params/loss_scale |
Dynamic loss scaling by default | |
params/maximum_iterations |
params/maximum_decoding_length |
|
params/maximum_learning_rate |
Not implemented | |
params/param_init |
Not implemented | |
params/weight_decay |
params/optimizer_params/weight_decay |
|
train/save_checkpoints_secs |
Not implemented | |
train/train_steps |
train/max_step |
Parameters taking reference to Python classes should also be revised when upgrading to V2, as the class likely changed in the process. This concerns:
params/optimizer
andparams/optimizer_params
params/decay_type
andparams/decay_params
OpenNMT-tf 2.0 is using the newly introduced graph rewriter to automatically convert parts of the graph from float32
to float16
.
Variables are casted on the fly and checkpoints no longer need to be converted for inference or to continue training in float32
. This means mixed precision is no longer a property of the model but should be enabled on the command line instead, e.g.:
onmt-main --model_type Transformer --config data.yml --auto_config --mixed_precision train