Releases: NVIDIA/NeMo
NVIDIA Neural Modules 1.18.0
Highlights
Models
- GPT-2B-001, trained on 1.1T tokens with 4K sequence length.
- STT En Fast Conformer-CTC Large
- STT En Fast Conformer-Transducer Large
- STT En Fast Conformer-Transducer Large LibriSpeech
- STT En FastConformer Hybrid Transducer-CTC Large P&C
- STT De FastConformer Hybrid Transducer-CTC Large P&C
- STT Es FastConformer Hybrid Transducer-CTC Large P&C
- STT It FastConformer Hybrid Transducer-CTC Large P&C
- STT Pl FastConformer Hybrid Transducer-CTC Large P&C
- STT Ua FastConformer Hybrid Transducer-CTC Large P&C
- STT Hr FastConformer Hybrid Transducer-CTC Large P&C
- STT By Conformer-RNNT Large
NeMo ASR
- Hybrid Autoregressive Transducer (HAT) #6260
- Apple MPS Support for ASR Inference #6289
- InterCTC Support for Hybrid ASR Models #6215
- RNNT N-Gram Fusion with mAES algo #6118
- ASR + Apple M2 CPU/GPU MPS #6289
NeMo TTS
- TTS directory structure refactor
- User-set symbol vocabulary #6172
NeMo Megatron
- Model parallelism from Megatron Core #6393
- Continued training for P-tuning #6273
- SFT for GPT-3 #6210
- Tensor and pipeline model parallel conversion #6218
- Megatron NMT Export to Riva
NeMo Core
Detailed Changelogs
ASR
Changelog
- minor cleanup by @messiaen :: PR: #6311
- docs on the use of heterogeneous test / val manifests by @bmwshop :: PR: #6352
- [WIP] add buffered chunked streaming for nemo force aligner by @Slyne :: PR: #6185
- Word boosting for Flashlight decoder by @trias702 :: PR: #6367
- Add installation and ASR inference instructions for Mac by @artbataev :: PR: #6377
- specaug speedup by @1-800-BAD-CODE :: PR: #6347
- updated lr for FC configs by @bmwshop :: PR: #6379
- Make possible to control tqdm progress bar in ASR models by @SN4KEBYTE :: PR: #6375
- [ASR] Conformer global tokens in local attention by @sam1373 :: PR: #6253
- fixed torch warning on using a list of numpy arrays by @MKNachesa :: PR: #6382
- Fix FastConformer config: correct bucketing strategy by @artbataev :: PR: #6413
- fixing the ability to use temp sampling with concat datasets by @bmwshop :: PR: #6423
- add conformer configs for hat model by @andrusenkoau :: PR: #6372
- [ASR] Add optimization util for linear sum assignment algorithm by @tango4j :: PR: #6349
- Added/updated new Conformer configs by @VahidooX :: PR: #6426
- Fix typos by @titu1994 :: PR: #6494
- Fix typos (#6523) by @titu1994 :: PR: #6539
- added back the fast emit section to the configs. by @VahidooX :: PR: #6540
- Add FastConformer Hybrid ASR models for EN, ES, IT, DE, PL, HR, UA, BY by @KunalDhawan :: PR: #6549
- Add scores for FastConformer models by @titu1994 :: PR: #6557
- Patch transcribe and support offline transcribe for hybrid model by @fayejf :: PR: #6550
- More streaming conformer export fixes by @messiaen :: PR: #6567
- Documentation for ASR-TTS models by @artbataev :: PR: #6594
- Patch transcribe_util for steaming mode and add wer calculation back to inference scripts by @fayejf :: PR: #6601
- Add HAT image to docs by @andrusenkoau :: PR: #6619
- Patch decoding for PC models by @titu1994 :: PR: #6630
- Fix wer.py where 'errors' variable was not set by @stevehuang52 :: PR: #6633
- Fix for old models in change_attention_model by @VahidooX :: PR: #6635
TTS
Changelog
NLP / NMT
Changelog
- [Core] return_config=True now extracts just config, not full tarfile by @titu1994 :: PR: #6346
- restore path for p-tuning by @arendu :: PR: #6273
- taskname and early stopping for adapters by @arendu :: PR: #6366
- Adapter tuning accepts expanded language model dir by @arendu :: PR: #6376
- Update gpt_training.rst by @blisc :: PR: #6378
- Megatron GPT model finetuning by @MaximumEntropy :: PR: #6210
- [NeMo Megatron] Cleanup configs to infer the models TP PP config automatically by @titu1994 :: PR: #6368
- Fix prompt template unescaping by @MaximumEntropy :: PR: #6399
- Add support for Megatron GPT Untied Embd TP PP Change by @titu1994 :: PR: #6388
- Move Parallelism usage from Apex -> Megatron Core by @aklife97 :: PR: #6393
- Add ability to enable/disable act ckpt and seq parallelism in GPT by @markelsanz14 :: PR: #6327
- Refactor PP conversion + add support for TP only conversion by @titu1994 :: PR: #6419
- fix CPU overheads of GPT synthetic dataset by @xrennvidia :: PR: #6427
- check if grad is none before calling all_reduce by @arendu :: PR: #6428
- Fix replace_bos_with_pad not found by @aklife97 :: PR: #6443
- Support Swiglu in TP PP Conversion by @titu1994 :: PR: #6437
- BERT pre-training mp fork to spawn by @aklife97 :: PR: #6442
- Meagtron encoder decoder fix for empty validation outputs by @michalivne :: PR: #6459
- Reduce workers on NMT CI by @aklife97 :: PR: #6472
- Switch to NVIDIA Megatron repo by @aklife97 :: PR: #6465
- Megatron KERPLE positional embeddings by @michalivne :: PR: #6478
- Support in external sample mapping for Megatron datasets by @michalivne :: PR: #6462
- Fix custom by @aklife97 :: PR: #6512
- GPT fp16 inference fix by @MaximumEntropy :: PR: #6543
- Fix for T5 FT model by @aklife97 :: PR: #6529
- Pass instead of scaler object to core by @aklife97 :: PR: #6545
- Change Megatron Enc Dec model to use persistent_workers by @aklife97 :: PR: #6548
- Turn autocast off when precision is fp32 by @aklife97 :: PR: #6554
- Fix batch size reconf for T5 FT for multi-validation by @aklife97 :: PR: #6582
- Make tensor split contiguous for qkv and kv in attention by @aklife97 :: PR: #6580
- Patches from main to r1.18.0 for Virtual Parallel by @titu1994 :: PR: #6592
- Create dummy iters to satisy iter type len checks in core + update core commit by @aklife97 :: PR: #6600
- Restore GPT support for interleaved pipeline parallelism by @timmoon10 :: PR: #6528
- Add megatron_core to requirements by @ericharper :: PR: #6639
Export
Changelog
Bugfixes
Changelog
- Fix the GPT SFT datasets loss mask bug by @yidong72 :: PR: #6409
- [BugFix] Fix multi-processing bug in data simulator by @tango4j :: PR: #6310
- Fix cache aware hybrid bugs by @VahidooX :: PR: #6466
- [BugFix] Force _get_batch_preds() to keep logits in decoder timestamp… by @tango4j :: PR: #6500
- Fixing bug in unsort_tensor by @borisfom :: PR: #6320
- Bugfix for BF16 grad reductions with distopt by @timmoon10 :: PR: #6340
- Limit urllib3 version to patch issue with RTD by @aklife97 :: PR: #6568
General improvements
Changelog
- Pin the version to hopefully fix rtd build by @SeanNaren :: PR: #6334
- enabling diverse datasets in val / test by @bmwshop :: PR: #6306
- extract inference weights by @arendu :: PR: #6353
- Add opengraph support for NeMo docs by @titu1994 :: PR: #6380
- Adding basic preemption code by @athitten :: PR: #6161
- Add documentation for preemption support by @athitten :: PR: #6403
- Update hyperparameter recommendation based on experiments by @Zhilin123 :: PR: #6405
- exceptions with empty test / val ds config sections by @bmwshop :: PR: #6421
- Upgrade pt 23.03 by @ericharper :: PR: #6430
- Update README to add core installation by @aklife97 :: PR: #6488
- Not doing CastToFloat by default by @borisfom :: PR: #6524
- Update manifest.py for speedup by @stevehuang52 :: PR: #6565
- Update SDP docs by @erastorgueva-nv :: PR: #6485
- Update core commit hash in readme by @aklife97 :: PR: #6622
- Remove from jenkins by @ericharper :: PR: #6641
- Remove dup by @ericharper :: PR: #6643
NVIDIA Neural Modules 1.17.0
Highlights
NeMo ASR
- Online Clustering Diarizer
- High Level Diarization API
- PyCTC Decode Beam Search Support
- RNNT Beam Search Alignment Extraction
- InterCTC Loss
- AIStore Documentation
- ASR & AWS Multi-node Integration
- Convolution Invariant SDR losses
NeMo TTS
NeMo Megatron
- SqaredReLU, SwiGLU, No-Dropout
- Rotary Position Embedding
- Untie word embeddings and output projection
NeMo Core
- Dynamic freezing of modules during training
- NeMo Multi-Run Documentation
- ClearML Logging
- Early Stopping
- Experiment Manager Docs Update
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:23.02
Detailed Changelogs
ASR
Changelog
- Support Alignment Extraction for all RNNT Beam decoding methods by @titu1994 :: PR: #5925
- Use module-based k2 import guard by @artbataev :: PR: #6006
- Default RNNT loss to int64 targets by @titu1994 :: PR: #6011
- Added documentation section for ASR datasets from AIStore by @anteju :: PR: #6008
- Change perturb rng for reproducing results easily by @fayejf :: PR: #6042
- InterCTC loss and stochastic depth implementation by @Kipok :: PR: #6013
- Add pyctcdecode to high level beam search API by @titu1994 :: PR: #6026
- Convert esperanto into a notebook by @SeanNaren :: PR: #6070
- [ASR] Added a script for evaluating metrics for audio-to-audio by @anteju :: PR: #5971
- [ASR] Convolution-invariant SDR loss + unit tests by @anteju :: PR: #5992
- Adjust stochastic depth dropout probability calculation by @anteju :: PR: #6120
- Add file class based inference API for diarization by @SeanNaren :: PR: #5945
- Ngram by @karpnv :: PR: #6063
- remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
- Streaming conformer CTC export by @messiaen :: PR: #5837
- [TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155
- Ngram lm fusion for RNNT maes decoding by @andrusenkoau :: PR: #6118
- ASR Beam search documentation by @titu1994 :: PR: #6244
TTS
Changelog
- [TTS][ZH] added new NGC model cards with polyphone disambiguation. by @XuesongYang :: PR: #5940
- [TTS] deprecate AudioToCharWithPriorAndPitchDataset. by @XuesongYang :: PR: #5959
- [TTS][G2P] deprecate add_symbols by @XuesongYang :: PR: #5961
- Added list_available_models by @treacker :: PR: #5967
- Update Fastpitch energy bug by @blisc :: PR: #5969
- removed WHATEVER(1) ˌhwʌˈtɛvɚ from scripts/tts_dataset_files/ipa_cmudict-0.7b_nv22.10.txt by @MikyasDesta :: PR: #5869
- ONNX export for RadTTS by @borisfom :: PR: #5880
- Add some info about FastPitch SSL model by @redoctopus :: PR: #5994
- Vits doc by @treacker :: PR: #5989
- Ragged batching changes for RadTTS, some refactoring by @borisfom :: PR: #6020
- Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
- [TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
- [TTS] Add Spanish IPA dictionaries and heteronyms by @rlangman :: PR: #6037
- [TTS] Separate TTS tokenization and g2p util to fix circular import by @rlangman :: PR: #6080
- [TTS][refactor] Part 7 - move module from model file. by @XuesongYang :: PR: #6098
- [TTS][refactor] Part 1 - nemo.collections.tts.data by @XuesongYang :: PR: #6099
- [TTS][refactor] Part 2 - nemo.colletions.tts.parts by @XuesongYang :: PR: #6105
- [TTS][refactor] Part 6 - remove nemo.collections.tts.torch.README.md and tts_dataset.yaml by @XuesongYang :: PR: #6103
- [TTS][refactor] Part 3 - nemo.collections.tts.g2p.models by @XuesongYang :: PR: #6113
- [TTS] update German NGC models trained on Thorsten Datasets by @XuesongYang :: PR: #6125
- [TTS] remove old waveglow model that relies on torch_stft. by @XuesongYang :: PR: #6128
- [TTS] Move Spanish polyphones from heteronym to dictionary by @rlangman :: PR: #6123
- [TTS][refactor] Part 8 - added model inference tests to safeguard changes. by @XuesongYang :: PR: #6129
- remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
- [TTS][refactor] update tutorial import paths. by @XuesongYang :: PR: #6176
- [TTS] Add univnet scheduler by @ArtyomZemlyak :: PR: #6157
- [TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155
NLP / NMT
Changelog
- add new lannguages to doc by @yzhang123 :: PR: #5939
- Distributed Adam optimizer overlaps param all-gather with forward compute by @timmoon10 :: PR: #5684
- Refactor the retrieval services for microservice architecture by @yidong72 :: PR: #5910
- make validation accuracy reporting optional for adapters/ptuning by @arendu :: PR: #5843
- Add BERT support for overlapping forward compute with distopt communication by @timmoon10 :: PR: #6024
- [TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
- adding early stop callback to ptuning by @arendu :: PR: #6028
- Pr doc tn by @yzhang123 :: PR: #6041
- Adds several configurable flags for Megatron GPT models by @MaximumEntropy :: PR: #5991
- P-tuning refactor Part 1/N by @arendu :: PR: #6054
- Fast glu activations by @MaximumEntropy :: PR: #6058
- P-tuning refactor Part 2/N by @arendu :: PR: #6056
- P-tuning refactor Part 3/N by @arendu :: PR: #6106
- Explicitly check for united embeddings when logging params by @MaximumEntropy :: PR: #6085
- Add flag to get attention from fusion by @ericharper :: PR: #6049
- Improving text memmap generated index files error messages by @michalivne :: PR: #6093
- Megatron Encoder-Decoder Sampler Function by @michalivne :: PR: #6095
- Sentence piece legacy false compatibility by @arendu :: PR: #6154
- convert Megatron LM ckpt to NeMo PP support. by @yidong72 :: PR: #6159
- Avoid multiple warnings for loss mask by @mikolajblaz :: PR: #6062
- Propagate LayerNorm1P to TE by @mikolajblaz :: PR: #6061
- Filter p-tuning by example length by @arendu :: PR: #6182
- Add sequence parallel support to Rope positional embedding by @yidong72 :: PR: #6178
- Use a separate communicator for DP AMAX reduction by @erhoo82 :: PR: #6022
- Add persistent workers to GPT by @ericharper :: PR: #6205
- Micro batch loader for bert model by @shanmugamr1992 :: PR: #6046
- GPT P tuning Eval changes (#5952) by @aklife97 :: PR: #6272
- add template for taskname=taskname by @Zhilin123 :: PR: #6283
- added RPE + fixed RMSNorm by @Davood-M :: PR: #6304
- simplified notebook for p-tuning by @arendu :: PR: #6326
- Added num decoder blocks in megatron export by @Davood-M :: PR: #6331
Text Normalization / Inverse Text Normalization
Export
Changelog
- ONNX export for RadTTS by @borisfom :: PR: #5880
- Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
- Update docs for ExpManager and Exportable frameworks by @titu1994 :: PR: #6165
- Streaming conformer CTC export by @messiaen :: PR: #5837
- MixedFusedRMSNorm Export Fix by @Davood-M :: PR: #6296
- Added num decoder blocks in megatron export by @Davood-M :: PR: #6331
Bugfixes
Changelog
- Fix bug where GPT always enabled distopt overlapped param sync by @timmoon10 :: PR: #5995
- CS bugfix by @bmwshop :: PR: #6122
- RNNT patch by @titu1994 :: PR: #6231
- Notebook fixes by @titu1994 :: PR: #6212
- Small fixes for flashlight decoder by @trias702 :: PR: #6071
- Various fixes in docs and RNNT by @titu1994 :: PR: #6156
- Fix k2 and torchaudio installation (Docker, macOS) by @artbataev :: PR: #6094
- update and deprecate warning for Mic notebook by @fayejf :: PR: #6307
- small bugfix and add asr evaluator to doc by @fayejf :: PR: #6229
- Bug fixing for bucketing dataset by @VahidooX :: PR: #6191
- Fix character beam decoding algorithm with vocab index map by @titu1994 :: PR: #6140
- fix typo in asr evaluator readme by @fayejf :: PR: #6053
- Fix typos by @titu1994 :: PR: #6241
- [ASR]:fixed augmentor arguments for transcribe functionality of Hybrid CTC-RNNT model by @KunalDhawan :: PR: #6290
- Fix hybrid transcribe by @ArtyomZemlyak :: PR: #6003
- Fix buckeing seeding by @VahidooX :: PR: #6254
- Fix for CTC decoder setup by @vsl9 :: PR: #6303
- Fix RNNT Joint narrow() by @titu1994 :: PR: #6336
- Fix bugs with interctc mixin by @Kipok :: PR: #6228
- Update IPA dict path in tutorial by @redoctopus :: PR: #6208
- [TTS] fix broken tutorial for Tacotron2 by @XuesongYang :: PR: #6199
- [TTS] fix bugs for chinese and german tutorials. by @XuesongYang :: PR: #6216
- Fix radtts sort r17 by @borisfom :: PR: #6344
- Quick Fix for RadTTS test by @blisc :: PR: #6034
- Disabling radtts tests untin we have real model by @borisfom :: PR: #6036
- fix val loss computation in megatron by @anmolgupt :: PR: #5871
- Fix incomplete batches by @mikolajblaz :: PR: #6083
- Avoid unnecessarily accessing data loader with pipeline parallelism by @timmoon10 :: PR: #6164
- bugfix: file handlers are not closed. by @XuesongYang :: PR: #5956
- Fix Silence Sampling Algorithm for ASR Multi-speaker Data Simulator by @stevehuang52 :: PR: #5897
- Fix Windows bug with save_restore_connector by @trias702 :: PR: #5919
- fix broken link by @ericharper :: PR: #5968
- Fix torchaudio installation by @artbataev :: PR: #5850
- Fix reinstall.sh dependencies by @titu1994 :: PR: #6027
- Adding changes to fix the mv error by @tango4j :: PR: #6087
- Fix README by @flx42 :: PR: #6137
- Fix typos in voiceapp notebook by @titu1994 :: PR: #6262
- [BugFix] Fix diarization result path errors in tutorial notebook for r1.17.0 by @tango4j :: PR: #6234
- [BugFix] Fix ...
NVIDIA Neural Modules 1.16.0
Highlights
NeMo ASR
- ASR Evaluator
- Multi-channel dereverberation algorithm
- Hybrid ASR-TTS Models
- Flashlight Decoder Beam Search
- FastConformer Encoder with 8x subsampling
NeMo TTS
- SSL Voice Conversion
- Spectrogram Enhancer
- VITS
NeMo Megatron
- Per microbatch dataloader for GPT and BERT
- Adapters compatible with Faster Transformer
NeMo Core
- Nested model support
NeMo Tools
- NeMo Forced Aligner
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:23.01
ASR
Changelog
- Fix for incorrect computation of batched alignment in transducers by @Kipok :: PR: #5692
- Set the stream position to 0 for pydub by @jonghwanhyeon :: PR: #5752
- [Fix] ConformerEncoder forward when length is None by @anteju :: PR: #5761
- ASR evaluator by @fayejf :: PR: #5728
- [ASR][Test] Enable test for cache audio with a single worker by @anteju :: PR: #5763
- Flashlight Decoder for Nemo by @trias702 :: PR: #5790
- Fix data simulator by @stevehuang52 :: PR: #5813
- [ASR] Mask-based dereverb algorithm by @anteju :: PR: #5693
- Concat dataset and aistore support for label models by @Kipok :: PR: #5826
- Adding new features and speed up for multi-speaker data simulator by @tango4j :: PR: #5846
- Add Esperanto ASR example by @andrusenkoau :: PR: #5772
- Fix memory allocation of NeMo Multi-speaker Data Simulator by @stevehuang52 :: PR: #5864
- [ASR] Separate Audio-to-Text (BPE, Char) dataset construction by @artbataev :: PR: #5774
- Reduce memory usage in getMultiScaleCosAffinityMatrix function by @gabitza-tech :: PR: #5876
- Hybrid ASR-TTS models by @artbataev :: PR: #5659
- Set providers for onnxruntime inference session by @athitten :: PR: #5903
- [ASR] Configurable metrics for audio-to-audio + removed experimental decorators by @anteju :: PR: #5827
- Correct doc for RNNT transcribe() function by @titu1994 :: PR: #5904
- Update isort to the latest version by @artbataev :: PR: #5895
- FilterbankFeaturesTA to match FilterbankFeatures by @msis :: PR: #5913
- Fix hybridasr bug by @VahidooX :: PR: #5950
- replace symbols by @nithinraok :: PR: #5974
- fast conformer configs and doc by @bmwshop :: PR: #5970
- Update TitaNet-L and MSDD models by @nithinraok :: PR: #6023
- Fix enhancer usage by @artbataev :: PR: #6059
- update librosa args by @nithinraok :: PR: #6086
- Fix enhancer usage in ASR-TTS examples by @artbataev :: PR: #6116
- Fix k2 and torchaudio installation (Docker, macOS). Cherry-pick (#6094) by @artbataev :: PR: #6124
TTS
Changelog
- [TTS] Update Spanish TTS model to 1.15 by @rlangman :: PR: #5742
- [TTS][DE] refine grapheme-based tokenizer and fastpitch training recipe on thorsten's neutral datasets. by @XuesongYang :: PR: #5753
- No-script TS export, prepared for ONNX export by @borisfom :: PR: #5653
- Fixing masking in RadTTS bottleneck layer by @borisfom :: PR: #5771
- Port Riva's mel cepstral distortion w/ dynamic time warping notebook by @redoctopus :: PR: #5778
- Update radtts' infer path by @blisc :: PR: #5788
- [TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805
- [TTS] porting VITS implementation by @treacker :: PR: #5600
- [TTS][DE] updated IPA dictionary and heteronyms by @XuesongYang :: PR: #5860
- [TTS] GAN-based spectrogram enhancer by @racoiaws :: PR: #5565
- TTS inference with Heteronym classification model, hc model inference refactoring by @ekmb :: PR: #5768
- Remove MCD_DTW tarball by @redoctopus :: PR: #5889
- Hybrid ASR-TTS models by @artbataev :: PR: #5659
- Moved eval notebook data to aws by @redoctopus :: PR: #5911
- [G2P] fixed typos and broken import library. by @XuesongYang :: PR: #5978
- [G2P] backward compatibility for english tokenizer and bugfix by @XuesongYang :: PR: #5980
- fix links, add missing file by @ekmb :: PR: #6044
- [TTS] Spectrogram Enhancer: correct dim for length when loading data by @racoiaws :: PR: #6048
- [TTS] bugfix for fastpitch German tutorial by @XuesongYang :: PR: #6051
- [TTS] bugfix Chinese Fastpitch tutorial by @XuesongYang :: PR: #6055
- Fix enhancer usage by @artbataev :: PR: #6059
- [TTS] Spectrogram Enhancer: support arbitrary input length by @racoiaws :: PR: #6060
- Fix enhancer usage in ASR-TTS examples by @artbataev :: PR: #6116
- [TTS] Spectrogram Enhancer: add option to zero out the initial tensor by @racoiaws :: PR: #6136
- [TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805
NLP / NMT
Changelog
- Fix P-Tuning Truncation by @vadam5 :: PR: #5663
- Adithyare/prompt learning seed by @arendu :: PR: #5749
- Add extra data args to support proper finetuning of HF converted T5 checkpoints by @MaximumEntropy :: PR: #5719
- Don't add output directory twice when creating shared sentencepiece tokenizer by @pks :: PR: #5737
- add constraint info on batch size for tar dataset by @yzhang123 :: PR: #5812
- remove transformer version upper bound by @Zhilin123 :: PR: #5831
- Adithyare/adapter new placement by @arendu :: PR: #5791
- Add SSL import functionality for Audio Lexical PNC Models by @trias702 :: PR: #5834
- validation batch sizing and drop_last controls by @arendu :: PR: #5830
- Remove ending newlines when encoding strings w/ sentencepiece tokenizer by @pks :: PR: #5739
- Fix segmenting for pcla inference by @jubick1337 :: PR: #5849
- RETRO model finetuning by @yidong72 :: PR: #5800
- Optimizing distributed Adam when running with one work queue by @timmoon10 :: PR: #5560
- Add option to disable distributed parameters in distributed Adam optimizer by @timmoon10 :: PR: #5685
- set max_steps for lr decay through config by @anmolgupt :: PR: #5780
- Fix Prompt text space issue by @aklife97 :: PR: #5983
- Add batch_size to prompt_learning generate by @aklife97 :: PR: #6091
NeMo Tools
Changelog
- [Tools] NeMo Forced Aligner by @erastorgueva-nv :: PR: #5571
- [Tools] Fix ctc segmentation: exclude audacity files by @ekmb :: PR: #6009
Export
Changelog
General Improvements
Changelog
- Pin lightning version less than 1.9.0 by @SeanNaren :: PR: #5822
- Davidm/cherrypick r1.16.0 by @Davood-M :: PR: #6082
- Update files for lightning 1.9.0 by @SeanNaren :: PR: #5823
- Tn doc 16 by @yzhang123 :: PR: #5954
- Ensure EMA checkpoints are also deleted when normal checkpoints are by @SeanNaren :: PR: #5724
- [Fix] ConformerEncoder forward when length is None by @anteju :: PR: #5761
- Fix EMA topk checkpoint deletion by @SeanNaren :: PR: #5758
- [BugFix] decoder timestamp count has a mismatch when is decoded by @tango4j :: PR: #5825
- Update 00_NeMo_Primer.ipynb by @schaltung :: PR: #5740
- Sanitize params before DLLogger log_hyperparams by @milesial :: PR: #5736
- NeMo Forced Aligner by @erastorgueva-nv :: PR: #5571
- Add EMA Docs, fix common collection documentation by @SeanNaren :: PR: #5757
- Add container info to main page by @fayejf :: PR: #5816
- CommonVoice support for script by @SeanNaren :: PR: #5797
- Support nested NeMo models by @artbataev :: PR: #5671
- fix max len generation t5 by @ekmb :: PR: #5852
- NFA samples fix by @erastorgueva-nv :: PR: #5856
- fix(readme): fix typo by @jqueguiner :: PR: #5883
- Block large files from being merged into NeMo main by @SeanNaren :: PR: #5898
- Pin isort version by @artbataev :: PR: #5914
- fixed missing long_description_content_type by @XuesongYang :: PR: #5909
- Update container to 23.01 by @ericharper :: PR: #5917
- remove conda pynini install by @ekmb :: PR: #5921
- Update align.py by @Slyne :: PR: #6043
- Fixing data simulator argument and bash scripting error by @tango4j :: PR: #6112
- Update apex commit by @ericharper :: PR: #6148
NVIDIA Neural Modules 1.15.0
Highlights
NeMo ASR
- HybridTransducer-CTC ASR
- Greedy timestamp decoding with inference script
- MHA adapters
- Conformer local attention (longformer)
- High level beam search API
- Multiblank transducer
- Multi-channel audio processing model
- AIstore for ASR datasets
NeMo Megatron
- ALiBi position embeddings support for T5
NeMo TTS
- Chinese TTS pipeline with polyphone disambiguation
NeMo Core
- Optimizer based EMA
- MLFlow logger support
Models
- stt_eo_conformer_ctc_large (HF, NGC) Esperanto ASR model.
- stt_eo_conformer_transducer_large (HF, NGC) Esperanto ASR model.
Detailed Changelogs
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.12
ASR
Changelog
- optimized loop and bugfix by @Jorjeous :: PR: #5573
- Update torchmetrics by @nithinraok :: PR: #5566
- Add an option to defer data setup from init to setup by @anteju :: PR: #5569
- AIStore for ASR datasets by @anteju :: PR: #5462
- Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
- Update documentation and tutorials for Adapters by @titu1994 :: PR: #5610
- Conformer local attention by @sam1373 :: PR: #5525
- Add core classes and functions for online clustering diarizer part 1 by @tango4j :: PR: #5526
- [Add] ASR+VAD Inference Pipeline by @stevehuang52 :: PR: #5575
- [ASR] Audio processing base, multi-channel enhancement models by @anteju :: PR: #5356
- Expose ClusteringDiarizer device by @SeanNaren :: PR: #5681
- Add Beam Search support to ASR transcribe() by @titu1994 :: PR: #5443
- Multiblank Transducer by @hainan-xv :: PR: #5527
- pin torchmetrics version by @nithinraok :: PR: #5720
- Update torchaudio dependency version for tutorials by @titu1994 :: PR: #5781
- update torchmetrics to latest version by @nithinraok :: PR: #5801
- Fix transducer and question answering tutorial bugs bugs by @Zhilin123 :: PR: #5809
- [BugFix] Updated CTC decoders installation in tutorial by @vsl9 :: PR: #5833
- update torchmetrics args confusionmatrix by @nithinraok :: PR: #5853
- indentation fix by @nithinraok :: PR: #5861
- Fix wrong label mapping in batch_inference for label_model by @fayejf :: PR: #5767
TTS
Changelog
- Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
- [TTS] fix ranges of char set for accented letters. by @XuesongYang :: PR: #5607
- [TTS] add type hints and change varialbe names for tokenizers and g2p by @XuesongYang :: PR: #5602
- Fixed RadTTS unit test by @borisfom :: PR: #5572
- [TTS][ZH] Disambiguate polyphones with augmented dict and Jieba segmenter for Chinese FastPitch by @yuekaizhang :: PR: #5541
- Add duration padding support for RADTTS inference by @kevjshih :: PR: #5650
- [TTS] add tts dict cust notebook by @ekmb :: PR: #5662
- [TN/TTS docs] TN customization, g2p docs moved to tts by @ekmb :: PR: #5683
- typo and link fixed by @ekmb :: PR: #5741
- link fixed by @ekmb :: PR: #5745
- Update Tacotron2 NGC checkpoint load to latest version by @redoctopus :: PR: #5760
- Docs g2p update by @ekmb :: PR: #5769
- [TTS][ZH] bugfix import jieba errors. by @XuesongYang :: PR: #5776
NLP / NMT
Changelog
- Text generation improvement (UI client, data parallel support) by @yidong72 :: PR: #5437
- O2 style amp for gpt3 ptuning by @JimmyZhang12 :: PR: #5246
- Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
- Bert interleaved by @shanmugamr1992 :: PR: #5556
- Port stateless timer to exp manager by @MaximumEntropy :: PR: #5584
- Add interface for making amax reduction optional for FP8 by @ksivaman :: PR: #5447
- Propagate attention_dropout flag for GPT-3 by @mikolajblaz :: PR: #5669
- Enc-Dec model size reporting fixes by @MaximumEntropy :: PR: #5623
- Add prompt learning tests by @arendu :: PR: #5649
- Fix missing torchelastic fixes for PTL 1.8 by @MaximumEntropy :: PR: #5691
- ALiBi Positional Embeddings by @michalivne :: PR: #5467
- Megatron export triton update by @Davood-M :: PR: #5766
- Fix transducer and question answering tutorial bugs bugs by @Zhilin123 :: PR: #5809
- Update description for question answering tutorial by @Zhilin123 :: PR: #5814
- TPMLP for T5-based models by @Davood-M :: PR: #5840
- Megatron positional encoding alibi fix by @michalivne :: PR: #5808
Export
Changelog
General Improvements
Changelog
- Update to pytorch 22.12 container by @ericharper :: PR: #5694
- optimized loop and bugfix by @Jorjeous :: PR: #5573
- Expose ClusteringDiarizer device by @SeanNaren :: PR: #5681
- remove useless files. by @XuesongYang :: PR: #5580
- [Fix] setup_multiple validation/test data by @anteju :: PR: #5585
- Move to optimizer based EMA implementation by @SeanNaren :: PR: #5169
- [Temp workaround] Disable test with cache_audio to unblock CI by @anteju :: PR: #5615
- [EMA] Change success message to reduce confusion by @SeanNaren :: PR: #5621
- Temporarily disable prompt learning CI tests by @ericharper :: PR: #5633
- [Dockerfile] Remove AIS archive from docker image by @anteju :: PR: #5629
- [workflow] add exclude labels option to ignore cherry-picks in releas… by @XuesongYang :: PR: #5645
- Add DLLogger support to exp_manager by @milesial :: PR: #5658
- Fix EMA restart by allowing device to be set by the class init by @SeanNaren :: PR: #5668
- Remove SDP (moved to separate repo) - merge to main by @erastorgueva-nv :: PR: #5630
- temp disable speaker recognision CI test by @fayejf :: PR: #5696
- Don't print exp_manager warning when max_steps == -1 by @milesial :: PR: #5725
- Add tabular data generation documents to the index file by @yidong72 :: PR: #5733
- fix token id bug by @yidong72 :: PR: #5777
- Update numpy requirements from 1.21 to 1.22 by @Zhilin123 :: PR: #5785
- Fix setuptools to usable version by @titu1994 :: PR: #5798
- add apt-get upgrade -y in dockerfile by @fayejf :: PR: #5817
- Update NeMo Multi-Run docs by @titu1994 :: PR: #5844
- add ambernet to readme by @fayejf :: PR: #5872
- update apex install instructions for 1.15 by @ericharper :: PR: #5901
NVIDIA Neural Modules 1.14.0
Highlights
NeMo ASR
- Hybrid CTC + Transducer loss ASR #5364
- Sampled Softmax RNNT (Enables large vocab RNNT, for speech translation and multilingual ASR) #5216
- ASR Adapters hyper parameter search scripts #5159
- RNNT {ONNX, TorchScript} x GPU export infer #5248
- Exportable MelSpectrogram (TorchScript) #5512
- Audio To Audio Dataset Processor #5196
- Multi Channel Audio Transcription #5479
- Silence Augmentation #5476
NeMo Megatron
- Support for the Mixture of Experts for T5
- Fix PTL model size output for GPT-3 and BERT
- BERT with Tensor Parallelism & Pipeline Parallel Support
NeMo Core
- Hydra Multirun core support + NeMo HP optim in YAML #5159
NeMo Models
Detailed Changelogs
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.11
ASR
Changelog
- [Tools][ASR] Tool for generating data using simulated RIRs by @anteju :: PR: #5158
- Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
- Add Gradio App to ASR Docs by @titu1994 :: PR: #5270
- Add support for Sampled Softmax for RNNT Joint by @titu1994 :: PR: #5216
- Speed up HF data processing script for ASR by @titu1994 :: PR: #5330
- bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
- Add cpWER for evaluation of ASR with diarization by @tango4j :: PR: #5279
- Fix for getting tokenizer in character-based ASR models when using tarred dataset by @jonghwanhyeon :: PR: #5442
- Refactor/unify ASR offline and buffered inference by @fayejf :: PR: #5440
- Standalone diarization+ASR evaluation script by @tango4j :: PR: #5439
- [ASR] Transcribe for multi-channel signals by @anteju :: PR: #5479
- Add Silence Augmentation by @fayejf :: PR: #5476
- add exportable mel spec by @1-800-BAD-CODE :: PR: #5512
- add RNN-T loss implemented by PyTorch and test code by @hainan-xv :: PR: #5312
- [ASR] AudioToAudio datasets and related test by @anteju :: PR: #5196
- Add StreamingFeatureBufferer class for real-life streaming decoding by @tango4j :: PR: #5534
- Pool stats with padding by @1-800-BAD-CODE :: PR: #5403
- Adding Hybrid RNNT-CTC model by @VahidooX :: PR: #5364
- Fix ASR Buffered inference scripts by @titu1994 :: PR: #5552
- Add wer details - insertion, deletion, substitution rate by @fayejf :: PR: #5557
- Add support for Time Stamp calculation using transcribe_speech.py by @titu1994 :: PR: #5568
- [STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models by @andrusenkoau :: PR: #5639
TTS
Changelog
- [TTS] Fastpitch energy condition and refactoring by @subhankar-ghosh :: PR: #5218
- [TTS] HiFi-TTS Download Script by @oleksiivolk :: PR: #5241
- [TTS] Add Mandarin/English Bilingual Recipe for Training Fastpitch Models by @yuekaizhang :: PR: #5208
- [TTS] fixed type of filepath and rename openslr. by @XuesongYang :: PR: #5276
- [TTS] replace obsolete torch_tts unit test marker with run_only_on('CPU') by @XuesongYang :: PR: #5307
- [TTS] bugfix IPAG2P and refactor to remove duplicate process. by @XuesongYang :: PR: #5304
- Update path to get_data.py in TTS tutorial by @redoctopus :: PR: #5311
- [TTS] Replace IPA lambda arguments with locale string by @rlangman :: PR: #5298
- [TTS] expand to support flexible dictionary entry formats in IPAG2P. by @XuesongYang :: PR: #5318
- [TTS] update organization of model checkpoints and their pointers. by @XuesongYang :: PR: #5327
- [TTS] bugfix for the script of generating mels from fastpitch. by @XuesongYang :: PR: #5344
- [TTS] Add Spanish model documentation by @rlangman :: PR: #5390
- [TTS] Add Spanish FastPitch training configs by @rlangman :: PR: #5383
- [TTS] replace pitch normalization params with ??? by @XuesongYang :: PR: #5392
- [TTS] Create script for processing TTS training audio by @rlangman :: PR: #5262
- [TTS] remove useless logic for set_tokenizer. by @XuesongYang :: PR: #5430
- [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue by @borisfom :: PR: #5358
- JOC Optimization in FastPitch by @subhankar-ghosh :: PR: #5450
- [TTS] Support speaker level pitch normalization by @rlangman :: PR: #5455
- TTS tutorial update: use speaker 9017 instead of 6097 by @redoctopus :: PR: #5532
- [TTS] Remove unused TTS eval function by @redoctopus :: PR: #5605
- [TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. by @XuesongYang :: PR: #5596
- [TTS][DOC] add notes about automatic conversion to target sampling ra… by @XuesongYang :: PR: #5624
- [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
- [TTS][ZH] bugfix for ngc cli installation. by @XuesongYang :: PR: #5652
- [TTS][ZH] fix broken link for the script. by @XuesongYang :: PR: #5666
NLP / NMT
Changelog
- Option to pad the last validation input sequence if its smaller than the encoder sequence length for MegatronGPT by @anmolgupt :: PR: #5243
- Fixes bugs with loss averaging with for Megatron GPT by @shanmugamr1992 :: PR: #5329
- Fixing bug in Megatron BERT when loss mask is all zeros by @shanmugamr1992 :: PR: #5424
- support to disable sequence length + 1 input tokens for each sample in MegatronGPT by @anmolgupt :: PR: #5363
- [TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414
- Bug fix/gpt by @shanmugamr1992 :: PR: #5493
- prompt tuning fix for unscale grad errors by @arendu :: PR: #5523
- Bert sequence parallel support by @shanmugamr1992 :: PR: #5494
- NLP docs fixes by @vsl9 :: PR: #5528
- Switch order of args in optimizer_step override by @ericharper :: PR: #5549
- Upgrade to 22.11 by @ericharper :: PR: #5550
- Merge r1.13.0 main by @ericharper :: PR: #5570
- some tokenizers do not have additional_special_tokens_ids attribute by @arendu :: PR: #5642
- Remove cell output from tutorial by @ericharper :: PR: #5689
Text Normalization / Inverse Text Normalization
Changelog
- [ITN] fix year date graph, cardinals extension for hundreds by @ekmb :: PR: #5435
- [TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414
Export
Changelog
- Fixed the onnx bug in conformer for non-streaming models. by @VahidooX :: PR: #5242
- Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
- Fixes for Conformer-xl export by @borisfom :: PR: #5309
- Remove onnx graphsurgery from Dockerfile by @titu1994 :: PR: #5320
- add exportable mel spec by @1-800-BAD-CODE :: PR: #5512
General Improvements
Changelog
- bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
- Fix setting up of learning rate scheduler by @PeganovAnton :: PR: #5444
- Better patch hydra by @titu1994 :: PR: #5591
- [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
- Add fully torch.jit.script-able speaker clustering module by @tango4j :: PR: #5191
- Update perturb.py by @stevehuang52 :: PR: #5231
- remove CV requirements. by @XuesongYang :: PR: #5233
- checks for accepted adapter type at module level by @arendu :: PR: #5194
- fix hypotheses return by @nithinraok :: PR: #5253
- Support for inserting additional subsampling in conformer encoder by @shan18 :: PR: #5224
- update tutorials to use meeting config as default and VAD by @nithinraok :: PR: #5237
- Specifying audio signal dropout separately for the Conformer Encoder by @shan18 :: PR: #5263
- created by @bmwshop :: PR: #5268
- Fix failing speaker counting for short audio samples by @tango4j :: PR: #5267
- O2bert + apex pipeline functions by @shanmugamr1992 :: PR: #5221
- Upperbound PTL by @titu1994 :: PR: #5302
- Update Interface(s) phonetic entry by @blisc :: PR: #5212
- add label inference support to EncDecSpeakerLabel class by @nithinraok :: PR: #5278
- Add italian model checkpoints by @Kipok :: PR: #5315
- Text Memmap Parsing Improvements by @michalivne :: PR: #5265
- Update librosa signature in HF processing script by @titu1994 :: PR: #5321
- Force wav file format for audio_filepath by @titu1994 :: PR: #5323
- Updates to T0 Dataset and Model by @MaximumEntropy :: PR: #5201
- [DOC] add sphinx-copybutton requirement to copy button on code snippets. by @XuesongYang :: PR: #5326
- Add support for Hydra multirun to NeMo by @titu1994 :: PR: #5159
- typo fix by @arendu :: PR: #5328
- add precommit hood to automatic sort entries in requirements. by @XuesongYang :: PR: #5333
- Add speaker clustering arguments to forward function by @tango4j :: PR: #5306
- Fixing de-autocast by @borisfom :: PR: #5319
- [Bugfix] Added rm -f / wget- nc command to avoid bash error in multispeaker sim notebook by @tango4j :: PR: #5292
- [DOC] added ipython dependency to support IPython.sphinxext extension by @XuesongYang :: PR: #5345
- Bug fix (removing old compute consumed samples) by @shanmugamr1992 :: PR: #5355
- removed uninstall nemo_cv and nemo_simple_gan and relax numba version… by @XuesongYang :: PR: #5332
- Enable mlflow logger by @whrichd :: PR: #4893
- Fix Python type hints according to Python Docs by @artbataev :: PR: #5370
- Distributed optimizer support for BERT by @timmoon10 :: PR: #5305
- SpeakerClustering: fix tensor dimennsions in forward() by @virajkarandikar :: PR: #5387
- add squad by @arendu :: PR: #5407
- added python and c++ alignment code by @yzhang123 :: PR: #5346
- Add MoE support for T5 model (w/o expert parallel) by @aklife97 :: PR: #5409
- Fix...
NVIDIA Neural Modules 1.13.0
Highlights
NeMo ASR
- Spoken Language Understanding (SLU) models based on Conformer encoder and transformer decoder
- Support for codeswitched manifests during training
- Support for Language ID during inference for ML models
- Support of cache-aware streaming for offline models
- Word confidence estimation for CTC & RNNT greedy decoding
NeMo Megatron
- Interleaved Pipeline schedule
- Transformer Engine for GPT
- HF T5v1.1 -> NeMo-Megatron conversion and finetuning/p-tuning
- IA3 and Adapter Tuning (Tensor + Pipeline Parallel)
- Pipeline Parallel Support for T5 Prompt Learning
- MegatronNMT export
NeMo TTS
- TTS introductory tutorial
- Phonemizer/espeak removal (Spanish/German)
- Char-only support for Spanish/German models
- Documentation Refactor
NeMo Core
- Upgrade to NGC PyTorch 22.09 container
- Add pre-commit hooks
- Exponential moving average (EMA) of weights during training
NeMo Models
- ASR Conformer Croatian: stt_hr_conformer_ctc_large and stt_hr_conformer_transducer_large
- ASR Conformer Belarusian: stt_be_conformer_ctc_large and stt_be_conformer_transducer_large
- ASR Squeezeformer Librispeech: 6 checkpoints (XS, S, SM, M, ML, L)
- SLURP Intent Classification / Slot Filling: slu_conformer_transformer_large_slurp
- LanguageID AmberNet: langid_ambernet
Detailed Changelogs
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.09
Known Issues
Issues
- pytest for RadTTSModel_export_to_torchscript are failing intermittently due to random input values. Fixed in main.
ASR
Changelog
- Add docs tutorial on kinyarwanda asr by @bene-ges :: PR: #4953
- Asr codeswitch by @bmwshop :: PR: #4821
- Add test for nested ASR model by @titu1994 :: PR: #5002
- Greedy decoding confidence for CTC and RNNT by @GNroy :: PR: #4931
- [ASR][Tools] RIR corpus generator by @anteju :: PR: #4927
- Add Squeezeformer CTC model checkpoints on Librispeech by @titu1994 :: PR: #5121
- adding loss normalization options to rnnt joint by @bmwshop :: PR: #4829
- Asr concat dataloader by @bmwshop :: PR: #5108
- Added ASR model comparison to SDE by @Jorjeous :: PR: #5043
- Add scripts for converting Spoken Wikipedia to asr dataset by @bene-ges :: PR: #5138
- ASR confidence bug fix for older Python versions by @GNroy :: PR: #5180
- Update ASR Scores and Results by @titu1994 :: PR: #5254
- [STT] Add Ru ASR Conformer-CTC and Conformer-Transducer by @ssh-meister :: PR: #5340
TTS
Changelog
- [TTS] Adding speaker embedding conditioning in fastpitch by @subhankar-ghosh :: PR: #4986
- [TTS] Remove PhonemizerTokenizer by @rlangman :: PR: #4990
- [TTS] FastPitch speaker interpolation by @subhankar-ghosh :: PR: #4997
- RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
- [TTS] remove phonemizer.py by @XuesongYang :: PR: #5090
- [TTS] Add NeMo TTS Primer Tutorial by @rlangman :: PR: #4933
- [TTS] Add SpanishCharsTokenizer by @rlangman :: PR: #5135
- Fixes for docs/typos + remove max_utts parameter from tarred datasets as it causes hang in training by @Kipok :: PR: #5118
- refactor TTS documentation organization and add new contents. by @XuesongYang :: PR: #5137
- [TTS][DOC] update models trained on HifiTTS dataset. by @XuesongYang :: PR: #5173
- [TTS] Fix TTS Primer image markup by @rlangman :: PR: #5192
- [TTS] deprecate TextToWaveform base class. by @XuesongYang :: PR: #5205
- [TTS] remove the avoidance of circular imports by @XuesongYang :: PR: #5214
- [TTS] remove LinVocoder and apply Vocoder as parent class. by @XuesongYang :: PR: #5206
- [TTS] unify requirements_tts.txt and requirements_torch_tts.txt by @XuesongYang :: PR: #5232
- Minor typo fixes in TTS tutorial by @redoctopus :: PR: #5266
- Radtts 1.13 by @borisfom :: PR: #5451
- Radtts 1.13 plus by @borisfom :: PR: #5457
NLP / NMT
Changelog
- IA3 support for GPT and T5 by @arendu :: PR: #4909
- Fix and refactor consumed samples save/restore for Megatron models. by @MaximumEntropy :: PR: #5077
- Remove unsupported arguments from MegatronNMT by @MaximumEntropy :: PR: #5065
- Update megatron interface to dialogue by @Zhilin123 :: PR: #4936
- gpt ia3 CI tests by @arendu :: PR: #5140
- Fix NMT Eval Sampler by @aklife97 :: PR: #5154
- Add interleaved pipeline schedule to GPT by @ericharper :: PR: #5025
- fix for bug in bignlp by @arendu :: PR: #5172
- Fixes some args that were not removed properly for multilingual Megatron NMT by @MaximumEntropy :: PR: #5142
- Fix absolute path in GPT Adapter CI tests by @arendu :: PR: #5184
- Add ability to configure drop last batch for validation datasets with MegatronGPT by @shanmugamr1992 :: PR: #5067
- Megatron Export Update by @Davood-M :: PR: #5343
- Fix GPT generation when using sentencepiece tokenizer by @MaximumEntropy :: PR: #5413
- Disable sync_batch_comm in validation_step for GPT by @ericharper :: PR: #5397
- Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448
- Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475
Text Normalization / Inverse Text Normalization
Changelog
- [Chinese text normalization] speed up graph building by @pengzhendong :: PR: #5128
NeMo Tools
Export
Changelog
- Fix export bug by @VahidooX :: PR: #5009
- RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
- Support TorchScript export for Squeezeformer by @titu1994 :: PR: #5164
- Expose keep_initializers_as_inputs to Exportable class by @pks :: PR: #5052
- Fix the self-attention export bug for cache-aware streaming Conformer by @VahidooX :: PR: #5114
- replace ColumnParallelLinear with nn.Linear in export_utils by @arendu :: PR: #5217
- Megatron Export Update by @Davood-M :: PR: #5343
- Fix Conformer Export in 1.13.0 (cherry-pick from main) by @artbataev :: PR: #5446
- export_utils bugfix by @Davood-M :: PR: #5480
- Export fixes for Riva by @borisfom :: PR: #5496
General Improvements and Bugfixes
Changelog
- don't use bfloat16 when in jit by @bmwshop :: PR: #5051
- Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448
- Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475
- Pin Transformers version to fix CI by @SeanNaren :: PR: #4955
- Fix changelog builder (#4962) by @titu1994 :: PR: #4963
- Checkpoint averaging class fix by @michalivne :: PR: #4946
- Add ability to give seperate datasets for test, train and validation by @shanmugamr1992 :: PR: #4798
- Add simple pre-commit file by @SeanNaren :: PR: #4983
- Import pycuda.autoprimaryctx or pycuda.autoinit to init pycuda execut… by @liji-nv :: PR: #4951
- Improvements to AMI script by @SeanNaren :: PR: #4974
- clean warnings from tests and CI runs, and prepare for upgrade to PTL 1.8 by @nithinraok :: PR: #4830
- Update libraries by @titu1994 :: PR: #5010
- add close inactive issues and PRs github action. by @XuesongYang :: PR: #5015
- Fix filename extraction in vad_utils.py by @GKPr0 :: PR: #4999
- Add black to pre-commit by @SeanNaren :: PR: #5027
- [CI] Enable previous build abort when new commit pushed by @SeanNaren :: PR: #5041
- Tutorials and Docs for Multi-scale Diarization Decoder by @tango4j :: PR: #4930
- Refactor output directory for MSDD Inference Notebook by @SeanNaren :: PR: #5044
- text_memmap dataset index range testing fix by @michalivne :: PR: #5034
- fix undefined constant in code example by @bene-ges :: PR: #5046
- Text generation refactor and RETRO text generation implementation by @yidong72 :: PR: #4985
- Lids by @bmwshop :: PR: #4820
- Add datasets folder, add diarization datasets voxconverse/aishell by @SeanNaren :: PR: #5042
- Fix the bugs in cache-aware streaming Conformer by @VahidooX :: PR: #5032
- Bug fix - Limit val batches set to 1.0 by @shanmugamr1992 :: PR: #5023
- [bug_fix] kv_channels is used when available by @arendu :: PR: #5066
- Add spe_split_by_unicode_script arg by @piraka9011 :: PR: #5072
- Transformer Engine Integration by @ericharper :: PR: #5104
- Text memmap dataset index memory efficiency by @michalivne :: PR: #5056
- Add NGC links for Aligner and FastPitch by @redoctopus :: PR: #5235
- Fix link to inference notebook by @redoctopus :: PR: #5247
- Fix links to speaker identification notebook by @SeanNaren :: PR: #5260
- Fix bug into Dialogue tutorial by @Zhilin123 :: PR: #5277
- PCLA tutorial typo fix by @jubick1337 :: PR: #5288
- Fix dialogue tutorial bug by @Zhilin123 :: PR: #5297
- small bugfix for r1.13.0 by @fayejf :: PR: #5310
- Add italian model checkpoints by @Kipok :: PR: #5316
- Pcla tutorial fixes by @jubick1337 :: PR: #5313
- Fix issue with HF Model upload tutorial by @titu1994 :: PR: #5359
- P&C LA tutorial fixes by @jubick1337 :: PR: #5354
- Add SDP documentation by @erastorgueva...
NVIDIA Neural Modules 1.12.0
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.08
ASR
Changelog
- Add support for RNNT Char/Word Timestamp Calculation by @titu1994 :: PR: #4665
- add conditional logic to rnnt_wer to handle when arrays have no elements by @mgoldey :: PR: #4776
- fix handling of the final word for rnnt word timestamps by @mgoldey :: PR: #4779
- amend rnnt word timestamps by @mgoldey :: PR: #4782
- fix type error in rnnt_wer.py, rnnt_wer_bpe.py, wer_bpe.py by @hainan-xv :: PR: #4822
- add kab language asr models by @nithinraok :: PR: #4819
- [Tutorial][ASR][Fix] Data paths in ASR with NeMo tutorial by @anteju :: PR: #4845
- [ASR] Fix for multi-channel signals in AudioSegment by @anteju :: PR: #4824
- [ASR] Generate multichannel noise by @anteju :: PR: #4870
- Fix asr model order by @nithinraok :: PR: #4959
- Fix ASR issues by @titu1994 :: PR: #4984
- Fix diarization ASR inference link in notebook by @SeanNaren :: PR: #5016
- Code switching by @KunalDhawan :: PR: #4784
- Release SOTA Lang ID model by @fayejf :: PR: #5080
- Stateless decoder for RNN-T by @hainan-xv :: PR: #4710
TTS
Changelog
- [TTS] use consistent spline interpolation for fastpitch and hifigan. by @XuesongYang :: PR: #4679
- TTS tokenizers moved to collections.common.tokenizers by @AlexGrinch :: PR: #4690
- [TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
- ARP to IPA mapping, g2p_encode for IPATokenizer by @ekmb :: PR: #4850
- IPA G2P bugfixes by @redoctopus :: PR: #4869
- [TTS] add missing WikiHomograph data entries to CMUdict, updates to match new ipa set by @ekmb :: PR: #4886
- [TTS] fix wrong g2p path. by @XuesongYang :: PR: #4902
- [TTS] FastPitch training: speed up align_prior_matrix calculation by @racoiaws :: PR: #4718
- [TTS] fix broken tutorial for MixerTTS. by @XuesongYang :: PR: #4949
- [TTS] bugfix 'EnglishPhonemesTokenizer' object has no attribute 'encode_from_g2p' by @XuesongYang :: PR: #4992
- [TTS] added missing German phoneme tokenizer by @XuesongYang :: PR: #5070
- [TTS] fixed wrong val loss for epoch 0 and inconsistent metrics names by @XuesongYang :: PR: #5087
NLP / NMT
Changelog
- Fix bug intent slot classification tokenizer to dialogue by @Zhilin123 :: PR: #4694
- Intent slot model onnx export test by @Zhilin123 :: PR: #4731
- Fix megatron p tuning notebook by @nithinraok :: PR: #4741
- Add support for Apex distributed Adam optimizer with GPT-3 by @timmoon10 :: PR: #4487
- Fixes NLPModel's load from checkpoint due to PTL private function changes by @MaximumEntropy :: PR: #4755
- Adapter tuning for Megatron GPT models by @arendu :: PR: #4717
- Megatron Encoder Decoder models with RPE and PP > 2 by @MaximumEntropy :: PR: #4663
- add kab language asr models by @nithinraok :: PR: #4819
- add chinese to language doc and fix bug by @yzhang123 :: PR: #4834
- Spoken Language Identification by @fayejf :: PR: #4846
- Fix decoding bug for megatron enc-dec models with O2 by @MaximumEntropy :: PR: #4989
- Updating Megatron LM conversion according to PTL 1.7 by @Davood-M :: PR: #5038
- Adding RETRO model Faiss sharding index and KNN sharding index by @yidong72 :: PR: #4713
- MLP Prompt Learning Encoder by @vadam5 :: PR: #4849
- Update the prompt learning to handle large lanague model by @yidong72 :: PR: #4906
Text Normalization / Inverse Text Normalization
Changelog
- [TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
- [Chinese text normalization]Chinese TN part in text_normalization by @mzxcpp :: PR: #4826
- Fix zh tn by @yzhang123 :: PR: #5035
- Bug fixes for parallel mp3 to wav conversion, PC notebook, update Readme for TN requirements by @ekmb :: PR: #5047
- Added P&C lexical audio model by @jubick1337 :: PR: #4802
Export
Changelog
- Intent slot model onnx export test by @Zhilin123 :: PR: #4731
General Improvements
Changelog
-
Fix logger reference by @SeanNaren :: PR: #4786
-
Fix error with class method reference in msdd by @SeanNaren :: PR: #4865
-
Add sync for logging calls to ensure aggregation across devices by @SeanNaren :: PR: #4876
-
Fix saving the last checkpoint when using val check interval by @SeanNaren :: PR: #4905
-
Add support for skipping validation on resume + extend saving last ckpt test by @SeanNaren :: PR: #4922
-
Move trainer calls for ssl models to training and validation steps only by @sam1373 :: PR: #4685
-
Change Num Partitions size expansion fix by @aklife97 :: PR: #4719
-
upgrade to PTL 1.7 by @nithinraok :: PR: #4672
-
Fixing outputs of infer() and use of NeMo length regulator helper by @borisfom :: PR: #4724
-
bug fix: enable async grad reduction when DP > 1 by @erhoo82 :: PR: #4740
-
Add LayerNorm1P, weight decay for LN and unscaled initialization by @mikolajblaz :: PR: #4743
-
jenkins data simulator fix by @nithinraok :: PR: #4751
-
Mutiscale Diarization Decoder (MSDD) model and module files by @tango4j :: PR: #4650
-
Fix logging in gradient clipping with PTL 1.7.2 by @MaximumEntropy :: PR: #4769
-
Fix checkpoint restoring by @nithinraok :: PR: #4777
-
avoid data clipping after convolution with rir samples by @nithinraok :: PR: #4806
-
Fixed in_features dim if bidirectional is True by @farisalasmary :: PR: #4588
-
Fix float/integer type error in WER.update() by @fujimotos :: PR: #4816
-
[Speech Data Explorer] An option to explicitly specify the base dir by @anteju :: PR: #4678
-
adding instancenorm as an option for conv normalization by @bmwshop :: PR: #4827
-
Fix small spelling mistakes by @SeanNaren :: PR: #4839
-
[Tutorials] Fix matplotlib version and directory name in Multispeaker_Simulator by @anteju :: PR: #4804
-
Update diarization folder structure by @tango4j :: PR: #4823
-
Missing types in clustering by @SeanNaren :: PR: #4858
-
Fix decoding for T5 models with RPE by @MaximumEntropy :: PR: #4847
-
Update Speaker Diarization notebooks with unknown oracle_num_speakers by @fayejf :: PR: #4861
-
Fix mha bug by @yzhang123 :: PR: #4859
-
Changes to MSDD code after review, fix test log call by @SeanNaren :: PR: #4881
-
Fixed output of BERT to be [batch x seq x hidden] by @michalivne :: PR: #4887
-
Add AMI dataset script by @SeanNaren :: PR: #4864
-
Update label_models.py by @stevehuang52 :: PR: #4891
-
Update tutorials.rst for question answering by @Zhilin123 :: PR: #4895
-
removed unused imports for all domains. by @XuesongYang :: PR: #4901
-
Fix ptl_load_state not providing cls by @MaximumEntropy :: PR: #4914
-
Remove unused cv collection by @okuchaiev :: PR: #4907
-
Add mixed-representation config to PhonemizerTokenizer by @rlangman :: PR: #4904
-
Fix implicit bug in _AudioLabelDataset by @stevehuang52 :: PR: #4923
-
Upgrade to NGC PyTorch 22.08 Container by @ericharper :: PR: #4929
-
Fix cherry pick workflow by @ericharper :: PR: #4964
-
check for active conda environment by @nithinraok :: PR: #4970
-
fix label models restoring issue from weighted cross entropy by @nithinraok :: PR: #4968
-
Add simple pre-commit file (#4983) by @SeanNaren :: PR: #4995
-
Fix bug in Squeezeformer Conv block by @titu1994 :: PR: #5011
-
Fix bugs by @Zhilin123 :: PR: #5036
-
Add black to pre-commit (#5027) by @SeanNaren :: PR: #5045
-
Fix bug in question answering tutorial by @Zhilin123 :: PR: #5049
-
Missing fixes from r1.11.0 to T5 finetuning eval by @MaximumEntropy :: PR: #5054
-
P&C docs by @jubick1337 :: PR: #5068
-
probabilites -> probabilities by @nithinraok :: PR: #5078
-
update strategy in notebook from ddp_fork to dp by @Zhilin123 :: PR: #5088
-
Fix Unhashable type list for Numba Cuda spec augment kernel by @titu1994 :: PR: #5093
-
T5 prompt learning fixes missing from r.11.0 merge by @MaximumEntropy :: PR: #5075
-
T5 Decoding with PP > 2 fix by @MaximumEntropy :: PR: #5091
-
Multiprocessing fix by @jubick1337 :: PR: #5106
-
bugfix: pybtex.database.InvalidNameString: Too many commas in author … by @XuesongYang :: PR: #5112
NVIDIA Neural Modules 1.11.0
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.07
ASR
Changelog
- Add ASR CTC Decoding module by @titu1994 :: PR: #4342
- Fixing bugs in calling method ctc_decoder_predictions_tensor. by @VahidooX :: PR: #4414
- Fixed WER initialization in ASR_with_Nemo notebook by @anteju :: PR: #4523
- Update signature of Hypothesis alignments by @titu1994 :: PR: #4511
- Add support for ASR Adapter Auxiliary Losses by @titu1994 :: PR: #4480
- Catalan ASR NGC Resource by @stevehuang52 :: PR: #4576
- Add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
- Add DALI char dataset support to SSL model by @piraka9011 :: PR: #4592
- Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
- Update Offline ASR with CTC Decoding by @titu1994 :: PR: #4608
- Add Squeezeformer to ASR by @titu1994 :: PR: #4416
- Fix ASR notebooks by @titu1994 :: PR: #4738
- Add pretrained ASR models for Croatian by @anteju :: PR: #4682
- Dataloader, collector, loss and metric for multiscale diarization decoder by @tango4j :: PR: #4187
- Multilingual VAD model by @fayejf :: PR: #4734
- Adding support for models trained with full context for cache-aware streaming. by @VahidooX :: PR: #4687
- Fp16 support for Conformer by @bmwshop :: PR: #4571
- Tiny VAD refactoring for postprocessing by @fayejf :: PR: #4625
- Add silence handling for speaker diarization pipeline by @nithinraok :: PR: #4512
- Add Bucketing support to TarredAudioToClassificationLabelDataset by @entn-at :: PR: #4465
TTS
Changelog
- Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
- Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
- Add static method decorator. by @XuesongYang :: PR: #4443
- Fix typo in HiFi-GAN config's max steps by @XuesongYang :: PR: #4450
- Relaxed support for both CPUs and GPUs by @XuesongYang :: PR: #4461
- Multi-speaker fastpitch model training recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4413
- Created the finetuning Hifigan 44100Hz recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4478
- Fix dataset parameter typo on tacotron2 example yaml by @saarus72 :: PR: #4471
- Update cmudict by @jasro23 :: PR: #4510
- Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
- Fix off-by-1 bug in Beta Binomial Prior by @rlangman :: PR: #4616
- G2P Aligner by @redoctopus :: PR: #4604
- RADTTS ADLR-NEMO porting by @MikyasDesta :: PR: #4538
- Fixed wrong pronunciations for r1.11. by @XuesongYang :: PR: #4677
- Incremented the version number to 22.08 in tutorials. by @XuesongYang :: PR: #4684
- Bugfix for missing configs. by @XuesongYang :: PR: #4725
- Fix pynini install in TTS tutorials by @redoctopus :: PR: #4729
- Updated config with a German IPA phoneme tokenizer by @XuesongYang :: PR: #4756
- Add multi-speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4763
- Add single male speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4770
- Deprecated old scripts for ljspeech. by @XuesongYang :: PR: #4780
- Fix MixerTTS data loading index error by @redoctopus :: PR: #4811
- G2P docs by @ekmb :: PR: #4841
- NMESC speaker counting algorithm update by @tango4j :: PR: #4500
NLP / NMT
Changelog
- Add O2 support for RETRO model by @yidong72 :: PR: #4411
- Add MTEncDec Finetune support by @aklife97 :: PR: #4540
- Fix metric setup for finetuning without a test set by @MaximumEntropy :: PR: #4585
- T0 model and dataset by @MaximumEntropy :: PR: #4598
- Add prompt learning for T5 by @HeyyyyyyG :: PR: #4391
- Add MuTransfer Capablity to RETRO model pretraining by @yidong72 :: PR: #4643
- Label Smoothing in VocabParallelCrossEntropy by @MaximumEntropy :: PR: #4602
- Megatron BART BOS / EOS bug fix by @michalivne :: PR: #4495
- GPT Prompt Learning Improvements by @vadam5 :: PR: #4496
- Megatron perceiver with tensor parallelism only by @MaximumEntropy :: PR: #4318
- Refactor for punctuation model by @jubick1337 :: PR: #4367
- Update megatron prompt learning interface to dialogue by @Zhilin123 :: PR: #4545
- Removed NLPDDPPlugin Import check by @vadam5 :: PR: #4555
- Option to disregard document boundaries for t5, bart, ul2 by @MaximumEntropy :: PR: #4481
- Add Tokenization and Normalization pre-proecssing script for NMT by @aklife97 :: PR: #4557
- Integrating support for GPT/T5/BART for Question Answering by @ameyasm1154 :: PR: #4532
- NeMo Megatron: Add sequence parallelism and selective activation checkpointing (rebased) by @ericharper :: PR: #4380
- Update megatron t5 interface to dialogue by @Zhilin123 :: PR: #4626
- Additional sentencepiece args - Byte fallback, split digits, split_on_whitespace by @MaximumEntropy :: PR: #4525
- Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training by @MaximumEntropy :: PR: #4396
- NeMo Megatron Doc updates1 by @okuchaiev :: PR: #4633
- Asymmetric Encoder and Decoder Configuration for Megatron Models by @MaximumEntropy :: PR: #4568
- Add sentencepiece legacy arg to megatron tokenizer configs by @MaximumEntropy :: PR: #4659
- Megatron encode function with RPE fix by @MaximumEntropy :: PR: #4692
- Updates to NeMo Megatron OSS docs by @okuchaiev :: PR: #4709
- Changes to make Megatron NMT exportable by @Davood-M :: PR: #4499
- fix bug relating to ddp strategy in joint intent slot classification … by @Zhilin123 :: PR: #4762
- Fix qa notebook typos and branch by @ericharper :: PR: #4788
- Colab py37 compatibility megatron by @Zhilin123 :: PR: #4791
- added/fixed export for Megatron models by @Davood-M :: PR: #4712
- Fix providing glue in seq2seq eval by @MaximumEntropy :: PR: #4843
- Fix Megatron NMT consumed samples and ckpt_to_nemo split rank by @MaximumEntropy :: PR: #4884
- Fixing Megatron BERT output dimensions to [batch x sec x hidden] by @michalivne :: PR: #4894
- Prompt Learning Inference Improvements by @vadam5 :: PR: #4566
- MegaMolBART Compatibility by @michalivne :: PR: #4603
Text Normalization / Inverse Text Normalization
Changelog
- Add ITN pt by @guidefloripa :: PR: #4516
- add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
- Fix ITN pt by @guidefloripa :: PR: #4623
- Bug fix hundred in Audio-based, added method so split text in sentences by @ekmb :: PR: #4610
- Fix itn pt time by @guidefloripa :: PR: #4630
- Pin lightning version to be < 1.7.0 by @MaximumEntropy :: PR: #4660
- G2P for OOV and heteronyms by @ekmb :: PR: #4624
- Publish pretrained itn t5 model for English by @bene-ges :: PR: #4748
- Added MLM Scoring by @yzhang123 :: PR: #4476
Export
Changelog
Bugfixes
Changelog
- Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
- Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
- Fix tarred dataset len when num shards is not divisible by workers by @itzsimpl :: PR: #4553
- Fix multiple dev/test datasets after restoring from checkpoint by @PeganovAnton :: PR: #4636
- Fix/need different cache dirs for different datasets by @PeganovAnton :: PR: #4640
- Improve mAES algorithm with patches by @titu1994 :: PR: #4662
General Improvements
Changelog
- Option to disable mp in VAD via num_workers=1 by @gkucsko :: PR: #4317
- Remove redundant bias expand by @xrennvidia :: PR: #4382
- Add option for specifying wandb save_dir from config by @shan18 :: PR: #4379
- Quick wav2vec fix. In-place operation adding convolutional positions … by @bonham79 :: PR: #4383
- Fixing import error in some cases by @borisfom :: PR: #4401
- Update with new conformer checkpoints. by @VahidooX :: PR: #4417
- Wav2vec fix by @bonham79 :: PR: #4467
- Relative Audio Paths by @stevehuang52 :: PR: #4470
- Allow Noam lr scheduler to run for more than max_steps by @alancucki :: PR: #4472
- Support for Different LRs with Param Groups by @stevehuang52 :: PR: #4508
- Fix runtime check by @borisfom :: PR: #4501
- Update finetune label models by @nithinraok :: PR: #4504
- Weighted bucketing by @tbartley94 :: PR: #4530
- Relative Audio Path by @stevehuang52 :: PR: #4520
- Fix duplex inference with grammars by @ekmb :: PR: #4517
- Add nsys profiling by @ericharper :: PR: #4539
- Remove the variable that is not used in the context. by @XuesongYang :: PR: #4547
- Adding multispeaker fastpitch and hifigan en model links to available… by @subhankar-ghosh :: PR: #4550
- Add length ratio filtering script by @MaximumEntropy :: PR: #4551
- Relative audio path in speech data explorer by @anteju :: PR: #4570
- Dividing generative question-answering CI tests by @ameyasm1154 :: PR: #4600
- Updating the default parameters in the example adapters config file by @shan18 :: PR: #4607
- Improve normalize_batch ValueError message by @piraka9011 :: PR: #4614
- Support listing Hugging Face model info by @titu1994 :: PR: #4619
- Update diarization data loader to train meeting data by @tango4j :: PR: #4567
- Fix HF check for model card info by @titu1994 :: PR: #4628
- Add Github Action for auto webpage build by @titu1994 :: PR: #4645
- Empty commit by @ti...
NVIDIA Neural Modules 1.10.0
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.05
Known Issues
Issues
- Tutorial: Fastpitch_Training_GermanTTS.ipynb is experimental and still being tested.
ASR
Changelog
- Multilang asr tutorial by @bmwshop :: PR: #3931
- Add ASR with Adapters Tutorial by @titu1994 :: PR: #4149
- Add support for Decoder + Joint Adapters for ASR by @titu1994 :: PR: #4189
- updating PretrainedModelInfo and benchmark sheet for ASR models by @krishnacpuvvada :: PR: #4259
- Remove verbose flag from Dali Index Creator by @titu1994 :: PR: #4309
- updating PretrainedModelInfo for ASR SSL models by @krishnacpuvvada :: PR: #4292
- Adding docs for ASR SSL by @krishnacpuvvada :: PR: #4303
- Add ASR Scores to Docs by @titu1994 :: PR: #4412
- [ASR] Replace all paths with /content/ by @titu1994 :: PR: #4427
- added conformer mandarin model. by @VahidooX :: PR: #4201
- Runtime audio segment sampling for SSL by @krishnacpuvvada :: PR: #4126
TTS
Changelog
- [TTS] Add volume passthrough to fp for riva by @blisc :: PR: #4167
- Update TTS Configs from LAMB to AdamW by @redoctopus :: PR: #4233
- Add benchmark=false to all TTS configs by @redoctopus :: PR: #4263
- [TTS] add staticmethod decoration for BetaBinomialInterpolator by @XuesongYang :: PR: #4319
- [TTS] capture exception of non-supported windows. by @XuesongYang :: PR: #4320
- [TTS] enforced pin_memory = True by @XuesongYang :: PR: #4341
- [TTS] Training Fastpitch on German text and phonemes and finetuning HiFi-GAN on predicted mels by @aroraakshit :: PR: #4266
- IPA support for TTS by @redoctopus :: PR: #4310
- Bits of RADTTS support by @borisfom :: PR: #4343
NLP / NMT
Changelog
- Megatron NMT Restore from T5/BART and finetune by @MaximumEntropy :: PR: #3977
- Binarized memmap dataloader for Megatron NMT, Inference and checkpoint -> nemo by @MaximumEntropy :: PR: #4137
- Use unique names for temporary directories in punctuation and capitalization tests by @PeganovAnton :: PR: #4298
- Removes debug logging statements in Megatron NMT by @MaximumEntropy :: PR: #4312
- Raise error if trainer object is None for MegatronBaseModel by @MaximumEntropy :: PR: #4356
- Punctuation and capitalization tests race condition by @PeganovAnton :: PR: #4399
- unify intent slot dataset util functions in tutorials by @Zhilin123 :: PR: #4445
- Fix for TP=2,PP=2 decoding with megatron encoder-decoder models by @MaximumEntropy :: PR: #4484
- Add RETRO model for pretraining by @yidong72 :: PR: #4121
- Add async grad allreduce and chunk optimization by @xrennvidia :: PR: #4084
- Implements the UL2 Dataset and config by @MaximumEntropy :: PR: #4184
- Add RETRO indexed dataset and inference by @yidong72 :: PR: #4220
- Finetune T5 on the prefix-lm objective by @MaximumEntropy :: PR: #4328
- Fuse bias with geglu in ParallelMLP by @xrennvidia :: PR: #4213
- Support larger datasets for question answering by @Zhilin123 :: PR: #4205
- Refactor bias act fusion by @MaximumEntropy :: PR: #4376
- Prompt Learning Pipeline Parallel by @vadam5 :: PR: #4291
- Text memmap dataset by @michalivne :: PR: #4068
- Fuse grad division into async grad allreduce by @xrennvidia :: PR: #4327
Text Normalization / Inverse Text Normalization
Changelog
- [TN] WFST to normalize punctuation by @ekmb :: PR: #4108
- [TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323
- Tn tutorial by @yzhang123 :: PR: #4090
- [TN] WFST to normalize punctuation by @ekmb :: PR: #4108
- Tn add rules by @yzhang123 :: PR: #4302
- [TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323
- Tn install by @yzhang123 :: PR: #4055
- Fix electronic bug, new time ITN rule by @ekmb :: PR: #4355
- [TN] Bug fix: expand serial coverage of unknown symbol, remove constraints from word graph by @ekmb :: PR: #4463
- Configure T5 finetuning metrics by @MaximumEntropy :: PR: #4122
Export
Core
Changelog
General Improvements and Fixes
Changelog
- Update container to 22.05 by @ericharper :: PR: #4329
- Fix PTL step calculation by @titu1994 :: PR: #4307
- [NLP] P&C Fix multi node cache issue, add pynini guard by @ekmb :: PR: #4410
- NeMo Megatron GPT Unit Tests by @ericharper :: PR: #4099
- Add the PP2 GPT eval CI test by @yidong72 :: PR: #4168
- BigNLP perf regression fix by @MaximumEntropy :: PR: #4267
- Fixes for Megatron Base Model Artifacts by @MaximumEntropy :: PR: #4248
- Fix a wrong description in offline_diarization_with_asr.yaml by @tango4j :: PR: #4141
- bugfix for import error in Offline_ASR_with_VAD_for_CTC_models by @fayejf :: PR: #4424
- [Fix] ASR RNNT Tutorial by @stevehuang52 :: PR: #4352
- [TTS] Fix Hifigan finetune tutorial by @subhankar-ghosh :: PR: #4182
- [Bugfix][TTS] wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4432
- [bugfix][TTS] pitch, voiced_mask, prob_voiced have the same values. by @XuesongYang :: PR: #4435
- [TTS] [bugfix] German FastPitch HiFi-GAN tutorial and lr by @aroraakshit :: PR: #4459
- [TTS] [bugfix] update indentation by @aroraakshit :: PR: #4468
- Fix some 's' cases for IPA G2P by @redoctopus :: PR: #4460
- Fix ASR Typos in tutorials by @titu1994 :: PR: #4384
- Use unique names for temporary directories in punctuation and capitalization tests by @PeganovAnton :: PR: #4298
- Punctuation and capitalization tests race condition by @PeganovAnton :: PR: #4399
- Dialogue tasks unit test by @Zhilin123 :: PR: #4112
- fix error by @yzhang123 :: PR: #4120
- fix typo by @stevehuang52 :: PR: #4134
- Fix cmudict typo: phoneme YI1 -> IY1 in NVME by @redoctopus :: PR: #4139
- transcribe: scan directories recursively by @virajkarandikar :: PR: #4159
- Add 44KHz yaml file for Fastpitch training by @subhankar-ghosh :: PR: #4161
- [bugfix] consistent highfreq to both fastpitch and hifigan in their 44100 configs. by @XuesongYang :: PR: #4177
- Upperbound OmegaConf by @titu1994 :: PR: #4191
- Prompt tokenization bugfix by @vadam5 :: PR: #4197
- Updated to Prompt Learning Model to Use Distributed Sampler by @vadam5 :: PR: #4208
- Freesound fixes by @virajkarandikar :: PR: #4155
- Patch Hydra by @titu1994 :: PR: #4202
- Prompt Learning Model Saving Changes by @vadam5 :: PR: #4212
- Speakertasks manifest by @yzhang123 :: PR: #4185
- SSL Multi-loss Update by @sam1373 :: PR: #4186
- Support load_adapters with just adapter_name by @titu1994 :: PR: #4255
- Add special tokens to existing (trained) SentencePiece models by @aklife97 :: PR: #4203
- Fixing the speed slow-down for speech models. by @VahidooX :: PR: #4260
- Fix and add functions in speaker utils by @tango4j :: PR: #4138
- pt container 1.10->1.11.0 by @ekmb :: PR: #4273
- ssl fixes by @sam1373 :: PR: #4268
- Save Virtual Prompt Weights Only by @vadam5 :: PR: #4237
- add 'relative positional embedding (RPE)' feature - re-creating after… by @khcs :: PR: #4256
- Docs CSS: Update h4 tag style for the right side bar by @nickolyamba :: PR: #4284
- Fix Docs CSS: align docs left and increase width for large screens by @nickolyamba :: PR: #4154
- remove redundant condition for fastpitch. by @XuesongYang :: PR: #4281
- [Add] automaticly resolving relative audio path by @stevehuang52 :: PR: #4277
- forcing conv subsampling to 32 bit by @bmwshop :: PR: #4293
- Add library name and version when downloading from the Hugging Face Hub by @osanseviero :: PR: #4304
- clear access registry when adding if not empty by @sam1373 :: PR: #4306
- [collections] bugfix for capturing NotImplementedError of non-supported sup data types. by @XuesongYang :: PR: #4297
- Adjust lr for AdamW from LAMB default by @redoctopus :: PR: #4308
- Fix bugs in indexed dataset exam script by @yidong72 :: PR: #4325
- Torchaudio installation fix by @GNroy :: PR: #4330
- Speedup the speech commands dataset processing script by @shan18 :: PR: #4347
- fix wrong requirement by @yzhang123 :: PR: #4349
- Refactored path to manifest by @treacker :: PR: #4251
- Fix the post LN bug by @yidong72 :: PR: #4350
- [Fix] Hanging for Fully Randomized Bucketing by @stevehuang52 :: PR: #4348
- Auto-switch the input dimensions in the conformer encoder adapter to correct value by @shan18 :: PR: #4354
- Set headscale false by @MaximumEntropy :: PR: #4364
- Add wandb as dependency by @titu1994 :: PR: #4365
- Fix trainer.global_steps in WandB logging by @titu1994 :: PR: #4366
- Finetuning changes for BART by @MaximumEntropy :: PR: #4003
- Make position embedding expansion specific to a batch to avoid checkpoint size mismatches by @MaximumEntropy :: PR: #4357
- Correct support for dataclasses in default module dim by @titu1994 :: PR: #4372
- Fix no attribute 'pad_id' bug when pre-processing by @yidong72 :: PR: #4377
- Question answering bug fix by @Zhilin123 :: PR: #4381
- Docs for NeMo Adapters by @titu1994 :: PR: #4369
- Update NeMo docs by @titu1994 :: PR: #4397
- Fixing import error in some cases by @borisfom :: PR: #4402
- Fix tutorial typos and docs by @titu1994 :: PR: #4415
- Add reconfigure on validation epoch start by @MaximumEntropy :: PR: #4393
- Re-apply fixes from r1.9.0 by @redoctopus :: PR: #4425
- Fix...
NVIDIA Neural Modules 1.9.0
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.04
ASR
Changelog
- Fix changed function name in offline vad asr notebeook by @fayejf :: PR: #4007
- NeMo Adapters Support + ASR Adapters by @titu1994 :: PR: #3942
- Update ASR configs with num_workers and pin_memory by @titu1994 :: PR: #4270
- Verbose k2 install, skip if failed by @GNroy :: PR: #4289
- Torch conversion for VAD-Diarization pipeline by @tango4j :: PR: #3930
- Multiprocess improvements by @nithinraok :: PR: #4127
TTS
Changelog
- Tn tts e by @ekmb :: PR: #3988
- Remove AudioToCharWithPriorAndPitchDataset dependency from fastpitch by @subhankar-ghosh :: PR: #4008
- Deprecation by @blisc :: PR: #4082
- FastPitch FT notebook - Improving Speech Quality clarifications by @redoctopus :: PR: #3954
NLP / NMT
Changelog
- Option to remove bias terms from Megatron transformers by @MaximumEntropy :: PR: #3973
- Add NMT method to translate with TN/ITN pre/post-processing by @MaximumEntropy :: PR: #4009
- Fix Punctuation and Capitalization model batching. An issue with shuffling. by @PeganovAnton :: PR: #4050
- Fix GPT model parallel eval by @yidong72 :: PR: #4054
- Updating with main by @jpilaul :: PR: #4073
- Cherry-pick fix for megatron ckpt conversion script when using BCP by @ericharper :: PR: #4089
- Check implicit grad acc in GLUE dataset building by @MaximumEntropy :: PR: #4123
- Fix/punctuation avoid overwritting tmp files by @PeganovAnton :: PR: #4144
- Fix/punctuation/trainer required for setting test data by @PeganovAnton :: PR: #4199
- Raise error if bicleaner is not installed in NMT Data preprocesing notebook by @MaximumEntropy :: PR: #4264
- Fix epoch end for NeMo NMT by @MaximumEntropy :: PR: #4265
- Update YAML with trainer.benchmark=False for NLP by @MaximumEntropy :: PR: #4261
- Add NMT method to translate with TN/ITN pre/post-processing by @MaximumEntropy :: PR: #4009
- Continuous prompt refactor by @vadam5 :: PR: #3877
- T5 finetuning for generic small text-to-text datasets by @MaximumEntropy :: PR: #4032
Text Normalization / Inverse Text Normalization
Changelog
- Tn special text support by @yzhang123 :: PR: #3969
- Tn update numbers by @yzhang123 :: PR: #3992
- Tn tts e by @ekmb :: PR: #3988
- Itn vi by @yzhang123 :: PR: #4029
- Refactor tn data folder, and update of measure by @yzhang123 :: PR: #4028
- Remove conda dependency for tn by @yzhang123 :: PR: #4057
- Tn electronic by @yzhang123 :: PR: #4053
- ThutmoseTaggerModel, a new model for inverse text normalization by @bene-ges :: PR: #4011
- Tutorial on ITN with Thutmose tagger and small fixes by @bene-ges :: PR: #4117
- Cleaned up TN/ ITN doc by @yzhang123 :: PR: #4119
- Update default for SH by @ekmb :: PR: #4135
- Update ContextNet version by @titu1994 :: PR: #4207
NeMo Tools
NeMo Core
Changelog
- Support pre-extracted nemo checkpoint for restoration by @titu1994 :: PR: #4061
- Fix type checking to be compatible with named tuples by @artbataev :: PR: #3986
- Update num worker calculation due to PTL flag changes by @redoctopus :: PR: #4056
- Refresh NeMo documentation to Sphinx Book Theme by @titu1994 :: PR: #3996
- Generalize adapter merge strategy for future adapters by @titu1994 :: PR: #4091
General Improvements
Changelog
- Fix Punctuation and Capitalization model batching. An issue with shuffling. by @PeganovAnton :: PR: #4050
- Fix restoring from checkpoint for case when is provided by @PeganovAnton :: PR: #4136
- Fix/punctuation avoid overwritting tmp files by @PeganovAnton :: PR: #4144
- Fix/punctuation/trainer required for setting test data by @PeganovAnton :: PR: #4199
- Ability to set log_prediction to false by @bmwshop :: PR: #3929
- Glu activation variants by @MaximumEntropy :: PR: #3951
- Ranking merge by @yzhang123 :: PR: #3906
- Fix path in doc by @nithinraok :: PR: #3979
- Adding fisher audio conversion script from old NeMo branch by @jbalam-nv :: PR: #3991
- improvements to geet_commonvoice_data script by @bmwshop :: PR: #3999
- Bugfix and variable name change for clustering code by @tango4j :: PR: #4023
- Exp manager log rank 0 only arguments by @MaximumEntropy :: PR: #4026
- Force import test on PR by @titu1994 :: PR: #4037
- Drop support for kaldi-io by @titu1994 :: PR: #4042
- Cherry pick HF integration and bug fixes from 1.8.1 by @ericharper :: PR: #4052
- Make saving prompt encoder embeddings non-configurable by @vadam5 :: PR: #4071
- Replace sampled tokens with EOD after EOD has been sampled once by @vadam5 :: PR: #4070
- Added answer only loss for prompt learning by @vadam5 :: PR: #4069
- added stacking suport to conformer. by @VahidooX :: PR: #4045
- Update LJSpeech whitelist file path by @redoctopus :: PR: #4078
- Added check for microbatch calculator by @vadam5 :: PR: #4043
- Prompt Learning Docs by @vadam5 :: PR: #4046
- Fix link to prompt tuning page by @SeanNaren :: PR: #4081
- Add docs for by @titu1994 :: PR: #4079
- Dialogue task by @Zhilin123 :: PR: #3884
- RMSNorm, Normformer and fixes from merging 1.8.0 into main by @MaximumEntropy :: PR: #4048
- Correct link to PTL by @titu1994 :: PR: #4088
- Added encoder and decoder modules for RETRO model by @yidong72 :: PR: #4038
- Upgrade container to NGC PyTorch 22.04 by @ericharper :: PR: #4085
- Tarred fix label models by @nithinraok :: PR: #4092
- Fix link to tutorial in dialogue docs by @Zhilin123 :: PR: #4093
- Prompt learning Notebook by @vadam5 :: PR: #4031
- Add more papers by @yzhang123 :: PR: #4097
- Ignore speakers with few utterances by @nithinraok :: PR: #3722
- Access mixin by @sam1373 :: PR: #4098
- Add CharParser for Cyrillic letters by @karpov-nick :: PR: #4101
- Restored tests previously disabled for 22.03 base by @borisfom :: PR: #4109
- Add augmentation to label models by @nithinraok :: PR: #4113
- Fix register artifacts by @ramanathan831 :: PR: #4116
- Fix typo by @yzhang123 :: PR: #4140
- bug_fix_diarization_manifest_creation by @yzhang123 :: PR: #4125
- Tacotron2 retrain by @treacker :: PR: #4103
- WaveGlow input type fixes by @redoctopus :: PR: #4151
- Notebooks' link, typo and import fix by @fayejf :: PR: #4158
- Thutmose tagger bug fixes by @bene-ges :: PR: #4162
- Update speaker docs by @nithinraok :: PR: #4164
- Set plugin to None when no apex by @ekmb :: PR: #4171
- Fix doc by @yzhang123 :: PR: #4152
- Small import name fix by @fayejf :: PR: #4180
- Rename folder VAD -> vad by @fayejf :: PR: #4163
- Fix the server key value problem in the notebook by @yidong72 :: PR: #4196
- Pin omegaconf for r1.9.0 by @ericharper :: PR: #4195
- Fix cherrypicks by @titu1994 :: PR: #4204
- Fix bugs for dialogue tutorial by @Zhilin123 :: PR: #4211
- Tacotron2 1.9.0 bugfixes by @redoctopus :: PR: #4209
- Add docs for Thutmose Tagger by @bene-ges :: PR: #4173
- Dialogue tutorial fix by @Zhilin123 :: PR: #4221
- Fix syntax error in ipynb-file by @bene-ges :: PR: #4228
- Fix JSON serialization problem by @yidong72 :: PR: #4235
- Prompt Learning Typo Fixes by @vadam5 :: PR: #4238
- Fixing bug 3642622 by @pasandi20 :: PR: #4250
- Fix broken link in the tutorial by @bene-ges :: PR: #4257
- Prompt learning notebook bugfix by @vadam5 :: PR: #4262
- Fix missing validation dataset, whitelist certain keywords for datasets by @titu1994 :: PR: #4269
- Set Save on train end to false by @vadam5 :: PR: #4274
- Updated config to fix CI test OOM error by @vadam5 :: PR: #4279
- Changed total virtual prompt tokens by @vadam5 :: PR: #4295