Release NVIDIA Neural Modules 1.11.0 · NVIDIA/NeMo

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.07

ASR

Changelog

Add ASR CTC Decoding module by @titu1994 :: PR: #4342
Fixing bugs in calling method ctc_decoder_predictions_tensor. by @VahidooX :: PR: #4414
Fixed WER initialization in ASR_with_Nemo notebook by @anteju :: PR: #4523
Update signature of Hypothesis alignments by @titu1994 :: PR: #4511
Add support for ASR Adapter Auxiliary Losses by @titu1994 :: PR: #4480
Catalan ASR NGC Resource by @stevehuang52 :: PR: #4576
Add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
Add DALI char dataset support to SSL model by @piraka9011 :: PR: #4592
Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
Update Offline ASR with CTC Decoding by @titu1994 :: PR: #4608
Add Squeezeformer to ASR by @titu1994 :: PR: #4416
Fix ASR notebooks by @titu1994 :: PR: #4738
Add pretrained ASR models for Croatian by @anteju :: PR: #4682
Dataloader, collector, loss and metric for multiscale diarization decoder by @tango4j :: PR: #4187
Multilingual VAD model by @fayejf :: PR: #4734
Adding support for models trained with full context for cache-aware streaming. by @VahidooX :: PR: #4687
Fp16 support for Conformer by @bmwshop :: PR: #4571
Tiny VAD refactoring for postprocessing by @fayejf :: PR: #4625
Add silence handling for speaker diarization pipeline by @nithinraok :: PR: #4512
Add Bucketing support to TarredAudioToClassificationLabelDataset by @entn-at :: PR: #4465

TTS

Changelog

Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
Add static method decorator. by @XuesongYang :: PR: #4443
Fix typo in HiFi-GAN config's max steps by @XuesongYang :: PR: #4450
Relaxed support for both CPUs and GPUs by @XuesongYang :: PR: #4461
Multi-speaker fastpitch model training recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4413
Created the finetuning Hifigan 44100Hz recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4478
Fix dataset parameter typo on tacotron2 example yaml by @saarus72 :: PR: #4471
Update cmudict by @jasro23 :: PR: #4510
Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
Fix off-by-1 bug in Beta Binomial Prior by @rlangman :: PR: #4616
G2P Aligner by @redoctopus :: PR: #4604
RADTTS ADLR-NEMO porting by @MikyasDesta :: PR: #4538
Fixed wrong pronunciations for r1.11. by @XuesongYang :: PR: #4677
Incremented the version number to 22.08 in tutorials. by @XuesongYang :: PR: #4684
Bugfix for missing configs. by @XuesongYang :: PR: #4725
Fix pynini install in TTS tutorials by @redoctopus :: PR: #4729
Updated config with a German IPA phoneme tokenizer by @XuesongYang :: PR: #4756
Add multi-speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4763
Add single male speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4770
Deprecated old scripts for ljspeech. by @XuesongYang :: PR: #4780
Fix MixerTTS data loading index error by @redoctopus :: PR: #4811
G2P docs by @ekmb :: PR: #4841
NMESC speaker counting algorithm update by @tango4j :: PR: #4500

NLP / NMT

Changelog

Add O2 support for RETRO model by @yidong72 :: PR: #4411
Add MTEncDec Finetune support by @aklife97 :: PR: #4540
Fix metric setup for finetuning without a test set by @MaximumEntropy :: PR: #4585
T0 model and dataset by @MaximumEntropy :: PR: #4598
Add prompt learning for T5 by @HeyyyyyyG :: PR: #4391
Add MuTransfer Capablity to RETRO model pretraining by @yidong72 :: PR: #4643
Label Smoothing in VocabParallelCrossEntropy by @MaximumEntropy :: PR: #4602
Megatron BART BOS / EOS bug fix by @michalivne :: PR: #4495
GPT Prompt Learning Improvements by @vadam5 :: PR: #4496
Megatron perceiver with tensor parallelism only by @MaximumEntropy :: PR: #4318
Refactor for punctuation model by @jubick1337 :: PR: #4367
Update megatron prompt learning interface to dialogue by @Zhilin123 :: PR: #4545
Removed NLPDDPPlugin Import check by @vadam5 :: PR: #4555
Option to disregard document boundaries for t5, bart, ul2 by @MaximumEntropy :: PR: #4481
Add Tokenization and Normalization pre-proecssing script for NMT by @aklife97 :: PR: #4557
Integrating support for GPT/T5/BART for Question Answering by @ameyasm1154 :: PR: #4532
NeMo Megatron: Add sequence parallelism and selective activation checkpointing (rebased) by @ericharper :: PR: #4380
Update megatron t5 interface to dialogue by @Zhilin123 :: PR: #4626
Additional sentencepiece args - Byte fallback, split digits, split_on_whitespace by @MaximumEntropy :: PR: #4525
Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training by @MaximumEntropy :: PR: #4396
NeMo Megatron Doc updates1 by @okuchaiev :: PR: #4633
Asymmetric Encoder and Decoder Configuration for Megatron Models by @MaximumEntropy :: PR: #4568
Add sentencepiece legacy arg to megatron tokenizer configs by @MaximumEntropy :: PR: #4659
Megatron encode function with RPE fix by @MaximumEntropy :: PR: #4692
Updates to NeMo Megatron OSS docs by @okuchaiev :: PR: #4709
Changes to make Megatron NMT exportable by @Davood-M :: PR: #4499
fix bug relating to ddp strategy in joint intent slot classification … by @Zhilin123 :: PR: #4762
Fix qa notebook typos and branch by @ericharper :: PR: #4788
Colab py37 compatibility megatron by @Zhilin123 :: PR: #4791
added/fixed export for Megatron models by @Davood-M :: PR: #4712
Fix providing glue in seq2seq eval by @MaximumEntropy :: PR: #4843
Fix Megatron NMT consumed samples and ckpt_to_nemo split rank by @MaximumEntropy :: PR: #4884
Fixing Megatron BERT output dimensions to [batch x sec x hidden] by @michalivne :: PR: #4894
Prompt Learning Inference Improvements by @vadam5 :: PR: #4566
MegaMolBART Compatibility by @michalivne :: PR: #4603

Text Normalization / Inverse Text Normalization

Changelog

Add ITN pt by @guidefloripa :: PR: #4516
add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
Fix ITN pt by @guidefloripa :: PR: #4623
Bug fix hundred in Audio-based, added method so split text in sentences by @ekmb :: PR: #4610
Fix itn pt time by @guidefloripa :: PR: #4630
Pin lightning version to be < 1.7.0 by @MaximumEntropy :: PR: #4660
G2P for OOV and heteronyms by @ekmb :: PR: #4624
Publish pretrained itn t5 model for English by @bene-ges :: PR: #4748
Added MLM Scoring by @yzhang123 :: PR: #4476

Export

Changelog

update fastpitch to add export controls by @blisc :: PR: #4509
Fix Fastpitch Export by @blisc :: PR: #4676
Changes to make Megatron NMT exportable by @Davood-M :: PR: #4499
Added/fixed export for Megatron models by @Davood-M :: PR: #4712

Bugfixes

Changelog

Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
Fix tarred dataset len when num shards is not divisible by workers by @itzsimpl :: PR: #4553
Fix multiple dev/test datasets after restoring from checkpoint by @PeganovAnton :: PR: #4636
Fix/need different cache dirs for different datasets by @PeganovAnton :: PR: #4640
Improve mAES algorithm with patches by @titu1994 :: PR: #4662

General Improvements

Changelog

Option to disable mp in VAD via num_workers=1 by @gkucsko :: PR: #4317
Remove redundant bias expand by @xrennvidia :: PR: #4382
Add option for specifying wandb save_dir from config by @shan18 :: PR: #4379
Quick wav2vec fix. In-place operation adding convolutional positions … by @bonham79 :: PR: #4383
Fixing import error in some cases by @borisfom :: PR: #4401
Update with new conformer checkpoints. by @VahidooX :: PR: #4417
Wav2vec fix by @bonham79 :: PR: #4467
Relative Audio Paths by @stevehuang52 :: PR: #4470
Allow Noam lr scheduler to run for more than max_steps by @alancucki :: PR: #4472
Support for Different LRs with Param Groups by @stevehuang52 :: PR: #4508
Fix runtime check by @borisfom :: PR: #4501
Update finetune label models by @nithinraok :: PR: #4504
Weighted bucketing by @tbartley94 :: PR: #4530
Relative Audio Path by @stevehuang52 :: PR: #4520
Fix duplex inference with grammars by @ekmb :: PR: #4517
Add nsys profiling by @ericharper :: PR: #4539
Remove the variable that is not used in the context. by @XuesongYang :: PR: #4547
Adding multispeaker fastpitch and hifigan en model links to available… by @subhankar-ghosh :: PR: #4550
Add length ratio filtering script by @MaximumEntropy :: PR: #4551
Relative audio path in speech data explorer by @anteju :: PR: #4570
Dividing generative question-answering CI tests by @ameyasm1154 :: PR: #4600
Updating the default parameters in the example adapters config file by @shan18 :: PR: #4607
Improve normalize_batch ValueError message by @piraka9011 :: PR: #4614
Support listing Hugging Face model info by @titu1994 :: PR: #4619
Update diarization data loader to train meeting data by @tango4j :: PR: #4567
Fix HF check for model card info by @titu1994 :: PR: #4628
Add Github Action for auto webpage build by @titu1994 :: PR: #4645
Empty commit by @titu1994 :: PR: #4646
Force git config for doc build by @titu1994 :: PR: #4647
Correct branch name for github page source by @titu1994 :: PR: #4648
Adding lang id to shard by @bmwshop :: PR: #4649
Fix special tokens in vocab to arguments of constructor by @gwarmstrong :: PR: #4631
Fix apex for r1.11 by @michalivne :: PR: #4666
Update readme by @nithinraok :: PR: #4667
Removed trailing spaces in CI test by @vadam5 :: PR: #4671
Pynini dependency fix by @ekmb :: PR: #4674
Fix for incorrect batch size issue while decoding by @rilango :: PR: #4675
Fix to fetch config file by @nithinraok :: PR: #4699
Fix notebook for buffered inference by @titu1994 :: PR: #4703
Prompt Learning Notebook Bug Fix by @vadam5 :: PR: #4689
Add psutils to mock imports by @ericharper :: PR: #4728
Update Aligner model and tutorial to add NGC checkpoint loading by @redoctopus :: PR: #4714
Updated docs and doc paths by @vadam5 :: PR: #4754
Update r1.11 to new heteronyms list by @redoctopus :: PR: #4745
Update CMUdict with more recent 0.7b entries by @redoctopus :: PR: #4768
Add pynini to Docker container by @artbataev :: PR: #4733
Fix tutorial formatting by @redoctopus :: PR: #4778
Fix initializing weights from ptl ckpt with exclude by @sam1373 :: PR: #4807
T5 prompt learning fixes by @MaximumEntropy :: PR: #4771
Updated inference code and squad scripts by @vadam5 :: PR: #4835
Fix uppercasing mismatch for IPA heteronyms by @redoctopus :: PR: #4860
Set the number of workers to 0 for validation and test sets in all enc-dec models by @MaximumEntropy :: PR: #4790
Fix mha by @yzhang123 :: PR: #4866
ipa bug fix by @ekmb :: PR: #4871
Added utf8 encoding by @vadam5 :: PR: #4892
Fix question answering docs r1p11 by @Zhilin123 :: PR: #4897

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA Neural Modules 1.11.0

Container

ASR

TTS

NLP / NMT

Text Normalization / Inverse Text Normalization

Export

Bugfixes

General Improvements

Contributors