Releases: NVIDIA/NeMo
Releases · NVIDIA/NeMo
NVIDIA Neural Modules 1.8.2
Known Issues
- Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.
TTS
- Fastpitch Tutorial fix by @subhankar-ghosh :: PR: #4044
NVIDIA Neural Modules 1.8.1
Known Issues
- Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.
TTS
- Restore_buffer bug fix and update NeMo checkpoint URL by @subhankar-ghosh :: PR: #4041
Hugging Face Hub Integration
Bug Fixes
NVIDIA Neural Modules 1.8.0
Known Issues
Issues
- Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.
- pytest for Vietnamese inverse text normalization are failing. Fixed in main
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.03
ASR
Changelog
- ASR SSL Update by @sam1373 :: PR: #3714
- Polylang asr by @bmwshop :: PR: #3721
- Test grad accumulation for RNNT loss by @titu1994 :: PR: #3731
- Add readme files describing model execution flow for ASR tasks by @titu1994 :: PR: #3812
- add fr asr ckpt to doc by @yzhang123 :: PR: #3809
- Fix asr tests in 22.02 by @titu1994 :: PR: #3823
- Add new pretrained Spanish ASR models by @erastorgueva-nv :: PR: #3830
- Documentation updates for ASR by @titu1994 :: PR: #3846
- Offline VAD+ASR tutorial by @fayejf :: PR: #3828
- Added Hindi and Marathi Models in Nemo pretrained ASR_CTC_BPE models … by @meghmak13 :: PR: #3856
- Add a missing line to ASR_with_NeMo.ipynb by @lifefeel :: PR: #3908
- Multilang asr models by @bmwshop :: PR: #3907
- added stt_en_conformer_transducer_large_ls to NGC by @VahidooX :: PR: #3920
- Fix DALI test on 22.03 by @titu1994 :: PR: #3911
- Adding RNN encoder for LSTM-Transducer and LSTM-CTC models by @VahidooX :: PR: #3886
- Fix issue with Segfault in ASR models by @titu1994 :: PR: #3956
- Added Mandarin pretrained Conformer-Transducer-Large model trained on AISHELL2. by @VahidooX :: PR: #3970
TTS
Changelog
- Bump TTS deprecation version to 1.9 by @blisc :: PR: #3955
- Add pinned pynini and scipy installs to TTS training tutorial by @redoctopus :: PR: #3967
- Compatability override to load_state_dict for old TTS checkpoints by @redoctopus :: PR: #3978
NLP / NMT
Changelog
- Use worker processes for data preprocessing by @crcrpar :: PR: #3665
- Set find_unused_parameters to False in GPT example script by @ericharper :: PR: #3837
- GPT multinode eval by @ericharper :: PR: #3821
- Fix MegatronPretrainingRandomSampler by taking into account by @crcrpar :: PR: #3826
- Add slot filling into DST Generative model by @Zhilin123 :: PR: #3695
- Disable nvfuser for gpt by @ericharper :: PR: #3845
- Multi-Label Joint Intent Slot Classification by @chenrichard10 :: PR: #3742
- fix bug in intent/slot model reloading by @carolmanderson :: PR: #3874
- Make test_gpt_eval unit test less strict by @yidong72 :: PR: #3898
- Comment gpt resume ci test by @MaximumEntropy :: PR: #3901
- Neural Machine Translation with Megatron Transformer Models (Tensor Parallel and Tarred Datasets Only) by @MaximumEntropy :: PR: #3861
- Megatron support by @ramanathan831 :: PR: #3893
- Populate the GPT/BERT with uploaded models by @yidong72 :: PR: #3885
- Megatron BART by @michalivne :: PR: #3666
- Additional Japanese processor for NMT that uses MeCab segmentation. Fix for BLEU in one-many NMT by @MaximumEntropy :: PR: #3889
- NMT GRPC sever URL fix by @MaximumEntropy :: PR: #3918
- Megatron legacy conversion support by @ramanathan831 :: PR: #3919
- Update max_epochs on megatron configs by @ericharper :: PR: #3958
- Fix NMT variable passing bug by @aklife97 :: PR: #3985
- Fix nemo megatron restore with artifacts by @ericharper :: PR: #3997
- Fix megatron notebook by @ramanathan831 :: PR: #4004
- Megatron work-arounds by @borisfom :: PR: #3998
- Add T5 model P-tuning support by @yidong72 :: PR: #3768
- Make index mappings dir configurable by @ericharper :: PR: #3868
- T5 pipeline parallel by @MaximumEntropy :: PR: #3750
Text Normalization / Inverse Text Normalization
Changelog
Export
Changelog
Bugfixes
General Improvements
Changelog
- Pynini pip by @yzhang123 :: PR: #3726
- upgrade PTL trainer flags by @nithinraok :: PR: #3589
- Updated Speech Data Explorer by @vsl9 :: PR: #3710
- Fix spelling error in num_workers parameter to actually set number of dataset workers specified in yaml configs by @themikem :: PR: #3800
- Support for Camembert Huggingface bert-like models by @itzsimpl :: PR: #3799
- Update to 22.02 by @ericharper :: PR: #3771
- Fixing the defaults of conformer models in the config files by @VahidooX :: PR: #3836
- Fix T5 Encoder Mask while decoding by @MaximumEntropy :: PR: #3838
- fix: multilingual transcribe does not require lang id param by @bmwshop :: PR: #3833
- Misc improvements by @titu1994 :: PR: #3843
- Change container by @MaximumEntropy :: PR: #3844
- Making gender assignment random for cardinals, fractions, and decimal… by @bonham79 :: PR: #3759
- Jenkinsfile test changes by @chenrichard10 :: PR: #3879
- Adding a RegEx tokenizers by @michalivne :: PR: #3839
- enable bias+dropout+add fusion with nvfuser at inference by @erhoo82 :: PR: #3869
- Add text_generation_util to support TopK, TopP sampling + Tabular Data Generation. by @yidong72 :: PR: #3834
- Ptl requirements bound by @MaximumEntropy :: PR: #3903
- doc links update by @ekmb :: PR: #3891
- add citations by @yzhang123 :: PR: #3902
- Update NeMo CI to 22.03 by @MaximumEntropy :: PR: #3900
- Add domain groups to changelog builder by @titu1994 :: PR: #3904
- add input threshhold by @yzhang123 :: PR: #3913
- improvements to commonvoice data script by @bmwshop :: PR: #3892
- fixes to the cleanup flag by @bmwshop :: PR: #3921
- Upgrade to PTL 1.6.0 by @ericharper :: PR: #3890
- JSON output from diarization now includes sentences. Optimized senten… by @demsarjure :: PR: #3897
- Stateless timer fix for PTL 1.6 by @MaximumEntropy :: PR: #3925
- fix save_best missing chpt bug, update for setup_tokenizer() changes by @ekmb :: PR: #3932
- Fix tarred sentence dataset length by @MaximumEntropy :: PR: #3941
- remove old doc by @ekmb :: PR: #3946
- Fix issues with librosa deprecations by @titu1994 :: PR: #3950
- Fix notebook bugs for branch r1.8.0 by @yidong72 :: PR: #3948
- Fix global batch fit loop by @ericharper :: PR: #3936
- Refactor restorefrom by @ramanathan831 :: PR: #3927
- Fix variable name and move models to CPU in Change partition by @aklife97 :: PR: #3972
- Fix notebook error by @yidong72 :: PR: #3975
- Notebook Bug Fixes for r1.8.0 by @vadam5 :: PR: #3989
- Fix compat override for TalkNet Aligner by @redoctopus :: PR: #3993
- docs fixes by @ekmb :: PR: #3987
- Fixes val_check_interval, skip loading train data during eval by @MaximumEntropy :: PR: #3968
- LogProb calculation performance fix by @yidong72 :: PR: #3984
- Fix P-Tune T5 model by @yidong72 :: PR: #4001
- Fix the broadcast shape mismatch by @yidong72 :: PR: #4017
- Add known issues to notebook by @ericharper :: PR: #4024
NVIDIA Neural Modules 1.7.2
GPT Bugfixes
- GPT dataloader improvements and fixes by @crcrpar :: PRs #3826 , #3665
- Disable nvfuser by @ericharper :: PR #3845
- Set find_unused_parameters to False by @ericharper :: PR #3837
T5 XNLI Example
NVIDIA Neural Modules 1.7.1
Known Issues
- find_unused_parameters should be False when training GPT: #3837
Bugfixes
- revert changes by @yzhang123 :: PR: #3785
- Fixed soft prompt eval loading bug by @vadam5 :: PR: #3805
- mT5 whole word masking and T5 finetuning config fixes by @MaximumEntropy :: PR: #3776
- Raise error if FP16 training is tried with O2 recipe. by @ericharper :: PR: #3806
NVIDIA Neural Modules 1.7.0
Known Issues
- Megatron GPT training with O2 and FP16 is bugged. FP16 and O1 still works.
- find_unused_parameters should be False when training GPT: #3837
- FastPitch training may result in stalled GPUs. Users will have to manually kill their runs and continue training from the latest checkpoint.
- mT5 issue with whole word masking, see #3776
- T5 finetuning config issue, see #3776
Container
NOTE: From NeMo 1.7.0 onwards, NeMo containers will follow the YY.MM conversion for naming, where the YY.MM value is based on the base container. For additional information regarding NeMo containers, please visit : https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.01
ASR
- Wav2vec by @tbartley94 :: PR: #3297
- Fix bug in multi-checkpoint loading by @sam1373 :: PR: #3536
- Add HuggingFace Datasets to NeMo ASR Dataset script by @titu1994 :: PR: #3513
- Add support for Gradient Clipping (clamp) in RNNT Numba loss by @titu1994 :: PR: #3550
- Enable Tarred Dataset Support for NVIDIA DALI by @titu1994 :: PR: #3485
- Add initial support for Buffered RNNT Scripts by @titu1994 :: PR: #3602
- Significantly speed up RNNT loss on CUDA by @titu1994 :: PR: #3653
- Fixing the bug in the stateful rnnt decoder. by @VahidooX :: PR: #3673
- Add Buffered RNNT with LCS Merge algorithm by @titu1994 :: PR: #3669
- Asr noise data scripts by @jbalam-nv :: PR: #3660
- ASR SSL update by @sam1373 :: PR: #3746
- Add randomized bucketing by @VahidooX :: PR: #3445
- Self-supervised tutorial & update by @sam1373 :: PR: #3344
- Updated conformer models. by @VahidooX :: PR: #3741
- Added speaker identification script with cosine and neural classifier… by @nithinraok :: PR: #3672
- Fix in clustering diarizer by @nithinraok :: PR: #3701
- Add a function that writes cluster label in diarization pipeline by @tango4j :: PR: #3643
TTS
- port UnivNet to NeMo TTS collection by @L0SG :: PR: #3186
- E2E TTS fixes by @redoctopus :: PR: #3508
- New structure for TTS datasets in scripts/dataset_processing, VocoderDataset, update TTSDataset by @Oktai15 :: PR: #3484
- Depreciate some TTS models and TTS datasets by @Oktai15 :: PR: #3576
- Fix bugs in HiFi-GAN (scheduler, optimizers) and add input_example() in Mixer-TTS/Mixer-TTS-X by @Oktai15 :: PR: #3564
- Update UnivNet, HiFi-GAN and WaveGlow, small fixes in Mixer-TTS, FastPitch and Exportable by @Oktai15 :: PR: #3585
- Fix typo in FastPitch config (pitch_avg -> pitch_mean) by @eyentei :: PR: #3593
- Fix incorrect usage of TTSDataset in some files and fix one-line bug in NVIDIA's CMUDict by @Oktai15 :: PR: #3594
- Convert entry from UTF-16 to UTF-8 by @redoctopus :: PR: #3597
- remove CheckInstall by @blisc :: PR: #3577
- Fix UnivNet LibriTTS pretrained location by @m-toman :: PR: #3615
- FastPitch training tutorial by @subhankar-ghosh :: PR: #3631
- Update Aligner, add new methods to AlignmentEncoder by @Oktai15 :: PR: #3641
- Add Mixed Representation Training by @blisc :: PR: #3473
- Add speakerID to libritts/get_data.py by @subhankar-ghosh :: PR: #3662
- Update TTS tutorials, Simplification of testing Mixer-TTS and FastPitch by @Oktai15 :: PR: #3680
- Clean FastPitch_Finetuning.ipynb notebook by @Oktai15 :: PR: #3698
- Add cache_size to BetaBinomialInterpolator, fix bugs in TTS tutorials and FastPitch by @Oktai15 :: PR: #3706
- Fix bugs in VocoderDataset and TTSDataset by @Oktai15 :: PR: #3713
- Fix bugs in E2E TTS, Mixer-TTS and FastPitch by @Oktai15 :: PR: #3740
NLP / NMT
- NLPDDPPlugin find_unused_parameters is configurable by @mlgill :: PR: #3478
- Megatron encoder-decoder refactor by @michalivne :: PR: #3542
- Finetuning NeMo Megatron T5 Models on GLUE by @MaximumEntropy :: PR: #3408
- Pipeline parallelism for GPT by @ericharper :: PR: #3388
- Generalized the P-tuning method to support various NLP tasks by @yidong72 :: PR: #3623
- Megatron_LM checkpoint to NeMo checkpoint support by @yidong72 :: PR: #3692
- Bugfix for GPT eval by @ericharper :: PR: #3744
- Yuya/megatron t5 glue eval by @yaoyu-33 :: PR: #3751
- Enforce legacy tokenizer for sentencepiece to add special tokens for T5 by @MaximumEntropy :: PR: #3457
- Added P-Tuning method by @yidong72 :: PR: #3488
- O2 style mixed precision training for T5 by @MaximumEntropy :: PR: #3664
- LM adapted T5 dataset by @MaximumEntropy :: PR: #3654
- Fix consumed samples calculation + PTune Model bugs by @yidong72 :: PR: #3738
- Add pipeline support to eval methods by @ericharper :: PR: #3684
- XNli benchmark by @yidong72 :: PR: #3693
- Refactor dialogue state tracking for modelling/dataset interoperability by @Zhilin123 :: PR: #3526
- Changes to support mean n-gram size masking for T5 by @MaximumEntropy :: PR: #3646
- Dialogue state tracking refactor by @Zhilin123 :: PR: #3667
- Parallel prompt tuning by @vadam5 :: PR: #3670
- GEGLU activation for T5 by @MaximumEntropy :: PR: #3694
Text Normalization / Inverse Text Normalization
- Text normalization takes too much time for a string which contains a lot of dates by @PeganovAnton :: PR: #3451
- ITN bug fixes, ip address, card num support, whitelist clean up by @ekmb :: PR: #3574
- Fix tn bugs by @yzhang123 :: PR: #3580
- add serial number to itn by @yzhang123 :: PR: #3584
- ITN: SH bug fixes for telephone by @ekmb :: PR: #3592
- Tn bug 1.7.0 by @yzhang123 :: PR: #3730
- TN docs update by @ekmb :: PR: #3735
Export
- Update UnivNet, HiFi-GAN and WaveGlow, small fixes in Mixer-TTS, FastPitch and Exportable by @Oktai15 :: PR: #3585
- Conformer onnx fix by @borisfom :: PR: #3524
- Add onnx support for speaker models by @nithinraok :: PR: #3650
- Jasper mask/export fix by @borisfom :: PR: #3691
Bugfixes
- Text normalization takes too much time for a string which contains a lot of dates by @PeganovAnton :: PR: #3451
- Dialogue state tracking refactor/ SGDGEN patch 2 by @Zhilin123 :: PR: #3674
- lower bound PTL to 1.5.10 and remove last ckpt patch fix by @nithinraok :: PR: #3690
Improvements
- Wfst tutorial by @tbartley94 :: PR: #3479
- Update CMUdict with ADLR version pronunciations by @redoctopus :: PR: #3446
- Fix docs by @yzhang123 :: PR: #3523
- Add docstring to UnivNetModel by @L0SG :: PR: #3529
- Increase lower bound due to security vulnerability by @ericharper :: PR: #3537
- Add Change Log builder to NeMo by @titu1994 :: PR: #3527
- Bugfix, need to freeze the model by @yidong72 :: PR: #3540
- Bucketing quick fix by @tbartley94 :: PR: #3543
- More fixes to SentencePiece for T5 by @MaximumEntropy :: PR: #3515
- Update CONTRIBUTING.md by @Oktai15 :: PR: #3569
- Update pr template and re-add Changelog builder by @titu1994 :: PR: #3575
- Apex quick fix by @ekmb :: PR: #3591
- Upgrade to 22.01 container by @ericharper :: PR: #3571
- Fix typo and update minimal version of scipy by @Oktai15 :: PR: #3604
- Add env variable to force transformers to run offline during CI by @ericharper :: PR: #3607
- Correctly install NeMo wheel by @titu1994 :: PR: #3599
- Fix wheel build by @titu1994 :: PR: #3610
- Fixed EH and error reporting in restore_from by @borisfom :: PR: #3583
- Clarifying documentation by @itzsimpl :: PR: #3616
- Improve docs for finetuning by @titu1994 :: PR: #3622
- Add NeMo version to all new .nemo files by @titu1994 :: PR: #3605
- Update numba if NVIDIA_PYTORCH_VERSION not correct by @itzsimpl :: PR: #3614
- Remove @experimental decorator in diarization related files. by @tango4j :: PR: #3625
- Remove compression from .nemo files by @okuchaiev :: PR: #3626
- Update adobe analytics by @ericharper :: PR: #3645
- Add ssl tutorial to tutorial docs page by @sam1373 :: PR: #3649
- Fix number of channels>1 issue by @ekmb :: PR: #3652
- Fixed the bug in bucketing. by @VahidooX :: PR: #3663
- Adding guard by @yzhang123 :: PR: #3655
- Add tutorial paths by @titu1994 :: PR: #3651
- Folder name update by @ekmb :: PR: #3671
- Test HF online for SGD-GEN only by @MaximumEntropy :: PR: #3681
- Update Librosa support to 0.9 by @titu1994 :: PR: #3682
- Comment out numba in 22.01 release by @titu1994 :: PR: #3685
- Fix failing tests inside of the 22.01 container in PR 3571 by @fayejf :: PR: #3609
- Fixed Apex guard when imported classes are used for default values by @michalivne :: PR: #3700
- Update citrinet_512.yaml by @Jorjeous :: PR: #3642
- update torchaudio in Dockerfile to match torch version by @GNroy :: PR: #3637
- Enforce import tests on the three domains by @titu1994 :: PR: #3702
- Audio based norm speed up by @ekmb :: PR: #3703
- Fix device on notebook by @titu1994 :: PR: #3732
- pynini pip by @yzhang123 :: PR: #3729
- Removed fp16 converting in complete method by @dimapihtar :: PR: #3709
- Mirror AN4 while CMU servers are down by @titu1994 :: PR: #3743
- Fix SSL configs for 1.7 by @sam1373 :: PR: #3748
- Punct process bug fix by @ekmb :: PR: #3747
- Specify gpus in SSL notebook by @sam1373 :: PR: #3753
- Duplex model inference fix, money encoder fix by @ekmb :: PR: #3754
- Update decoding strategy docs and override general value for tutorials by @titu1994 :: PR: #3755
- Fix directories in ssl notebook by @sam1373 :: PR: #3758
- Update Tacotron2_Training.ipynb by @blisc :: PR: #3769
- Fix dockerfile by @yzhang123 :: PR: #3778
- Prompt-Tuning-Documentation by @vadam5 :: PR: #3777
- Prompt tuning bug fix by @vadam5 :: PR: #3780
NVIDIA Neural Modules 1.6.2
Bug fix
- Changed Apex not found error to warning to enable NLP models which aren't apex dependent when Apex isn't installed.
NVIDIA Neural Modules 1.6.1
NVIDIA Neural Modules 1.6.0
ASR
- Add new features to ASR with diarization with modified tutorial and README. by @tango4j :: PR: #3007
- Enable stateful decoding of RNNT over multiple transcribe calls by @titu1994 :: PR: #3037
- Move vocabs from asr to common by @Oktai15 :: PR: #3084
- Adding parallel transcribe for ASR models - suppports multi-gpu/multi-node by @VahidooX :: PR: #3017
- CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
- Adding pretrained French ASR models to ctc_bpe and rnnt_bpe listings by @tbartley94 :: PR: #3225
- adding german conformer ctc and rnnt by @yzhang123 :: PR: #3242
- Add aishell and fisher dataset processing scripts for ASR by @jbalam-nv :: PR: #3203
- Better default for RNNT greedy decoding by @titu1994 :: PR: #3332
- Add uniform ASR evaluation script for all models by @titu1994 :: PR: #3334
- CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
- Updates on ASR with diarization util files by @tango4j :: PR: #3359
- Asr fr by @tbartley94 :: PR: #3404
- Refactor ASR Examples Directory by @titu1994 :: PR: #3392
- Asr patches by @titu1994 :: PR: #3443
- Properly support -1 for labels in ctc char models by @titu1994 :: PR: #3487
TTS
- MixerTTS, MixerTTSDataset and small updates in tts tokenizers by @Oktai15 :: PR: #2859
- ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
- Update name of files to one style in TTS folder by @Oktai15 :: PR: #3189
- Update TTS Dataset, FastPitch with TTS dataset and small improvements in HiFiGAN by @Oktai15 :: PR: #3205
- Add Beta-binomial Interpolator to TTSDataset by @Oktai15 :: PR: #3230
- Normalizer to TTS models, TTS tokenizer updates, AxisKind updates by @Oktai15 :: PR: #3271
- Update Mixer-TTS, FastPitch and TTSDataset by @Oktai15 :: PR: #3366
- Minor Updates to TTS Finetuning by @blisc :: PR: #3455
NLP / NMT
- NMT timing and tokenizer stats utils by @michalivne :: PR: #3004
- Add offsets calculation to MegatronGPTModel.complete method by @dimapihtar :: PR: #3117
- NMT checkpoint averaging by @michalivne :: PR: #3096
- NMT validation examples with inputs by @michalivne :: PR: #3194
- Improve data pipeline for punctuation capitalization model and make other useful changes by @PeganovAnton :: PR: #3159
- Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
- NLP text augmentation by @michalivne :: PR: #3291
- Adding Megatron NeMo Bert support by @yidong72 :: PR: #3303
- Added Script to convert Megatron LM to . nemo file by @yidong72 :: PR: #3371
- Support Changing Number of Tensor Parallel Partitions for Megatron by @aklife97 :: PR: #3365
- Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
- T5 Pre-training in NeMo using Megatron by @MaximumEntropy :: PR: #3036
- NMT MIM mean variance fix by @michalivne :: PR: #3385
- NMT Shared Embeddings Weights by @michalivne :: PR: #3340
- Make saving .nemo during on_train_end configurable by @ericharper :: PR: #3427
- Byte-level Multilingual NMT by @aklife97 :: PR: #3368
- BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
- NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
- (1) O2-style mixed precision recipe, (2) Persistent layer-norm, (3) Grade scale hysteresis, (4) gradient_as_bucket_view by @erhoo82 :: PR: #3259
Text Normalization / Inverse Text Normalization
- Tn clean upsample by @yzhang123 :: PR: #3024
- Tn add nn wfst and doc by @yzhang123 :: PR: #3135
- Update english tn ckpt by @yzhang123 :: PR: #3143
- WFST_tutorial for ITN development by @tbartley94 :: PR: #3128
- German TN wfst by @yzhang123 :: PR: #3174
- Add ITN Vietnamese by @binh234 :: PR: #3217
- WFST TN updates by @ekmb :: PR: #3235
- Itn german refactor by @yzhang123 :: PR: #3262
- Tn german deterministic by @yzhang123 :: PR: #3308
- TN updates by @ekmb :: PR: #3285
- Added double digits to EN ITN by @yzhang123 :: PR: #3321
- TN_non_deterministic optimized by @ekmb :: PR: #3343
- Missing init for TN German by @ekmb :: PR: #3355
- Ru TN by @ekmb :: PR: #3390
- Update ContextNet models trained on more datasets by @titu1994 :: PR: #3440
NeMo Tools
- CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
- Updated NumPy SDE requirement by @vsl9 :: PR: #3442
Export
- ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
- CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
Documentation
- Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
- Tn add nn wfst and doc by @yzhang123 :: PR: #3135
- Add apex into by @PeganovAnton :: PR: #3214
- Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
- Nemo container docker building instruction - merge to main by @fayejf :: PR: #3236
- Doc link fixes by @nithinraok :: PR: #3264
- French ASR Doc updates by @tbartley94 :: PR: #3322
- german asr doc page update by @yzhang123 :: PR: #3325
- update docs and replace speakernet with titanet in tutorials by @nithinraok :: PR: #3405
- Asr fr by @tbartley94 :: PR: #3404
- Update copyright to 2022 by @ericharper :: PR: #3426
- Update Speech Classificatoin - VAD doc by @fayejf :: PR: #3430
- Update speaker diarization docs by @tango4j :: PR: #3419
- NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
- Add verification helper function and update docs by @nithinraok :: PR: #3514
- Prompt tuning documentation by @vadam5 :: PR: #3541
- French ASR Doc updates by @tbartley94 :: PR: #3322
- German asr doc page update by @yzhang123 :: PR: #3325
Bugfixes
- Fixed wrong tgt_length for timing by @michalivne :: PR: #3050
- Update nltk version with a CVE fix by @thomasdhc :: PR: #3054
- Fix README by @ericharper :: PR: #3070
- Transformer Decoder: Fix swapped input name issue by @aklife97 :: PR: #3066
- Fixes bugs in collect_tokenizer_dataset_stats.py by @michalivne :: PR: #3060
- Attribute is not working in . by @PeganovAnton :: PR: #3099
- Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
- A quick fix for issue #3094 index out-of-bound when truncating long text to max_seq_length by @bugface :: PR: #3131
- Fixed two typos by @bene-ges :: PR: #3157
- Merge r1.5.0 bugfixes to main by @ericharper :: PR: #3173
- LJSpeech alignment scripts fixed for latest MFA by @m-toman :: PR: #3177
- Add apex into by @PeganovAnton :: PR: #3214
- Patch omegaconf for cfg by @fayejf :: PR: #3224
- Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
- CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
- Fix Masked SE for Citrinets + export Limited Context Citrinet by @titu1994 :: PR: #3216
- Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233
- Fix cast type in _se_pool_step_script related functions by @Oktai15 :: PR: #3239
- Doc link fixes by @nithinraok :: PR: #3264
- Escape chars fix by @ekmb :: PR: #3253
- Fix asr output - eval mode by @nithinraok :: PR: #3274
- Remove ArrayLike because it is not supported in numpy 1.18 by @PeganovAnton :: PR: #3282
- Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278
- Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
- Tn en money fix by @yzhang123 :: PR: #3290
- Fixing the bucketing_batch_size bug. by @VahidooX :: PR: #3294
- Adaptiv fixed positional embeddings by @michalivne :: PR: #3263
- Fix specaugment time start for numba kernel by @titu1994 :: PR: #3299
- Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304
- Fix bucketing list bug. by @VahidooX :: PR: #3315
- Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330
- Fix german and vietnames grammar by @yzhang123 :: PR: #3331
- Fix readme to show cmd by @yzhang123 :: PR: #3345
- Fix speaker label models training convergence by @nithinraok :: PR: #3354
- Tqdm get datasets by @bmwshop :: PR: #3358
- Fixed future masking in cross attention of Perceiver by @michalivne :: PR: #3314
- Fixed the bug of fixed-size bucketing. by @VahidooX :: PR: #3364
- Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376
- Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
- fixed the bug of bucketing when fixed-size batch is used. by @VahidooX :: PR: #3399
- TalkNet Fix by @stasbel :: PR: #3092
- Fix linear annealing not annealing lr to min_lr by @MaximumEntropy :: PR: #3400
- Resume training on SLURM multi-node multi-gpu by @itzsimpl :: PR: #3374
- Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413
- Fix order of lang checking to ignore input langs by @MaximumEntropy :: PR: #3417
- NMT MIM mean variance fix by @michalivne :: PR: #3385
- Fix bug for missing variable by @MaximumEntropy :: PR: #3437
- Asr patches by @titu1994 :: PR: #3443
- Prompt tuning loss mask fix by @vadam5 :: PR: #3438
- BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
- Fix hysterisis loading by @MaximumEntropy :: PR: #3460
- Fix the tutorial notebooks bug by @yidong72 :: PR: #3465
- Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461
- WFST Punct post fix + punct tutorial fixes by @ekmb :: PR: #3469
- Process correctly label ids dataset parameter + standardize type of label ids model attribute + minor changes (error messages, typing) by @PeganovAnton :: PR: #3471
- file name fix - Segmentation tutorial by @ekmb :: PR: #3474
- Patch fix for the multiple last checkpoints issue by @nithinraok :: PR: #3468
- Fix bug with arguments for TalkNet's preprocessor by @Oktai15 :: PR: #3481
- Fix description by @PeganovAnton :: PR: #3482
- typo fix in diarization notebooks by @nithinraok :: PR: #3480
- Fix check...
NVIDIA Neural Modules 1.5.1
Features
Known Issues
- Training of speaker models converge very slowly due to a bug (fixed in main: #3354)
- ASR training does not reach adequate WER due to bug in Numba Spec Augment (fixed in main : #3299). For details refer to #3288 (comment) . For a temporary workaround, disable Numba Spec Augment with https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/modules/audio_preprocessing.py#L471 set to False in the config for SpecAugment in the yaml config. The fix will be part of 1.6.0.