NVIDIA Neural Modules 1.11.0
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.07
ASR
Changelog
- Add ASR CTC Decoding module by @titu1994 :: PR: #4342
- Fixing bugs in calling method ctc_decoder_predictions_tensor. by @VahidooX :: PR: #4414
- Fixed WER initialization in ASR_with_Nemo notebook by @anteju :: PR: #4523
- Update signature of Hypothesis alignments by @titu1994 :: PR: #4511
- Add support for ASR Adapter Auxiliary Losses by @titu1994 :: PR: #4480
- Catalan ASR NGC Resource by @stevehuang52 :: PR: #4576
- Add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
- Add DALI char dataset support to SSL model by @piraka9011 :: PR: #4592
- Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
- Update Offline ASR with CTC Decoding by @titu1994 :: PR: #4608
- Add Squeezeformer to ASR by @titu1994 :: PR: #4416
- Fix ASR notebooks by @titu1994 :: PR: #4738
- Add pretrained ASR models for Croatian by @anteju :: PR: #4682
- Dataloader, collector, loss and metric for multiscale diarization decoder by @tango4j :: PR: #4187
- Multilingual VAD model by @fayejf :: PR: #4734
- Adding support for models trained with full context for cache-aware streaming. by @VahidooX :: PR: #4687
- Fp16 support for Conformer by @bmwshop :: PR: #4571
- Tiny VAD refactoring for postprocessing by @fayejf :: PR: #4625
- Add silence handling for speaker diarization pipeline by @nithinraok :: PR: #4512
- Add Bucketing support to TarredAudioToClassificationLabelDataset by @entn-at :: PR: #4465
TTS
Changelog
- Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
- Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
- Add static method decorator. by @XuesongYang :: PR: #4443
- Fix typo in HiFi-GAN config's max steps by @XuesongYang :: PR: #4450
- Relaxed support for both CPUs and GPUs by @XuesongYang :: PR: #4461
- Multi-speaker fastpitch model training recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4413
- Created the finetuning Hifigan 44100Hz recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4478
- Fix dataset parameter typo on tacotron2 example yaml by @saarus72 :: PR: #4471
- Update cmudict by @jasro23 :: PR: #4510
- Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
- Fix off-by-1 bug in Beta Binomial Prior by @rlangman :: PR: #4616
- G2P Aligner by @redoctopus :: PR: #4604
- RADTTS ADLR-NEMO porting by @MikyasDesta :: PR: #4538
- Fixed wrong pronunciations for r1.11. by @XuesongYang :: PR: #4677
- Incremented the version number to 22.08 in tutorials. by @XuesongYang :: PR: #4684
- Bugfix for missing configs. by @XuesongYang :: PR: #4725
- Fix pynini install in TTS tutorials by @redoctopus :: PR: #4729
- Updated config with a German IPA phoneme tokenizer by @XuesongYang :: PR: #4756
- Add multi-speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4763
- Add single male speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4770
- Deprecated old scripts for ljspeech. by @XuesongYang :: PR: #4780
- Fix MixerTTS data loading index error by @redoctopus :: PR: #4811
- G2P docs by @ekmb :: PR: #4841
- NMESC speaker counting algorithm update by @tango4j :: PR: #4500
NLP / NMT
Changelog
- Add O2 support for RETRO model by @yidong72 :: PR: #4411
- Add MTEncDec Finetune support by @aklife97 :: PR: #4540
- Fix metric setup for finetuning without a test set by @MaximumEntropy :: PR: #4585
- T0 model and dataset by @MaximumEntropy :: PR: #4598
- Add prompt learning for T5 by @HeyyyyyyG :: PR: #4391
- Add MuTransfer Capablity to RETRO model pretraining by @yidong72 :: PR: #4643
- Label Smoothing in VocabParallelCrossEntropy by @MaximumEntropy :: PR: #4602
- Megatron BART BOS / EOS bug fix by @michalivne :: PR: #4495
- GPT Prompt Learning Improvements by @vadam5 :: PR: #4496
- Megatron perceiver with tensor parallelism only by @MaximumEntropy :: PR: #4318
- Refactor for punctuation model by @jubick1337 :: PR: #4367
- Update megatron prompt learning interface to dialogue by @Zhilin123 :: PR: #4545
- Removed NLPDDPPlugin Import check by @vadam5 :: PR: #4555
- Option to disregard document boundaries for t5, bart, ul2 by @MaximumEntropy :: PR: #4481
- Add Tokenization and Normalization pre-proecssing script for NMT by @aklife97 :: PR: #4557
- Integrating support for GPT/T5/BART for Question Answering by @ameyasm1154 :: PR: #4532
- NeMo Megatron: Add sequence parallelism and selective activation checkpointing (rebased) by @ericharper :: PR: #4380
- Update megatron t5 interface to dialogue by @Zhilin123 :: PR: #4626
- Additional sentencepiece args - Byte fallback, split digits, split_on_whitespace by @MaximumEntropy :: PR: #4525
- Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training by @MaximumEntropy :: PR: #4396
- NeMo Megatron Doc updates1 by @okuchaiev :: PR: #4633
- Asymmetric Encoder and Decoder Configuration for Megatron Models by @MaximumEntropy :: PR: #4568
- Add sentencepiece legacy arg to megatron tokenizer configs by @MaximumEntropy :: PR: #4659
- Megatron encode function with RPE fix by @MaximumEntropy :: PR: #4692
- Updates to NeMo Megatron OSS docs by @okuchaiev :: PR: #4709
- Changes to make Megatron NMT exportable by @Davood-M :: PR: #4499
- fix bug relating to ddp strategy in joint intent slot classification … by @Zhilin123 :: PR: #4762
- Fix qa notebook typos and branch by @ericharper :: PR: #4788
- Colab py37 compatibility megatron by @Zhilin123 :: PR: #4791
- added/fixed export for Megatron models by @Davood-M :: PR: #4712
- Fix providing glue in seq2seq eval by @MaximumEntropy :: PR: #4843
- Fix Megatron NMT consumed samples and ckpt_to_nemo split rank by @MaximumEntropy :: PR: #4884
- Fixing Megatron BERT output dimensions to [batch x sec x hidden] by @michalivne :: PR: #4894
- Prompt Learning Inference Improvements by @vadam5 :: PR: #4566
- MegaMolBART Compatibility by @michalivne :: PR: #4603
Text Normalization / Inverse Text Normalization
Changelog
- Add ITN pt by @guidefloripa :: PR: #4516
- add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
- Fix ITN pt by @guidefloripa :: PR: #4623
- Bug fix hundred in Audio-based, added method so split text in sentences by @ekmb :: PR: #4610
- Fix itn pt time by @guidefloripa :: PR: #4630
- Pin lightning version to be < 1.7.0 by @MaximumEntropy :: PR: #4660
- G2P for OOV and heteronyms by @ekmb :: PR: #4624
- Publish pretrained itn t5 model for English by @bene-ges :: PR: #4748
- Added MLM Scoring by @yzhang123 :: PR: #4476
Export
Changelog
Bugfixes
Changelog
- Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
- Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
- Fix tarred dataset len when num shards is not divisible by workers by @itzsimpl :: PR: #4553
- Fix multiple dev/test datasets after restoring from checkpoint by @PeganovAnton :: PR: #4636
- Fix/need different cache dirs for different datasets by @PeganovAnton :: PR: #4640
- Improve mAES algorithm with patches by @titu1994 :: PR: #4662
General Improvements
Changelog
- Option to disable mp in VAD via num_workers=1 by @gkucsko :: PR: #4317
- Remove redundant bias expand by @xrennvidia :: PR: #4382
- Add option for specifying wandb save_dir from config by @shan18 :: PR: #4379
- Quick wav2vec fix. In-place operation adding convolutional positions … by @bonham79 :: PR: #4383
- Fixing import error in some cases by @borisfom :: PR: #4401
- Update with new conformer checkpoints. by @VahidooX :: PR: #4417
- Wav2vec fix by @bonham79 :: PR: #4467
- Relative Audio Paths by @stevehuang52 :: PR: #4470
- Allow Noam lr scheduler to run for more than max_steps by @alancucki :: PR: #4472
- Support for Different LRs with Param Groups by @stevehuang52 :: PR: #4508
- Fix runtime check by @borisfom :: PR: #4501
- Update finetune label models by @nithinraok :: PR: #4504
- Weighted bucketing by @tbartley94 :: PR: #4530
- Relative Audio Path by @stevehuang52 :: PR: #4520
- Fix duplex inference with grammars by @ekmb :: PR: #4517
- Add nsys profiling by @ericharper :: PR: #4539
- Remove the variable that is not used in the context. by @XuesongYang :: PR: #4547
- Adding multispeaker fastpitch and hifigan en model links to available… by @subhankar-ghosh :: PR: #4550
- Add length ratio filtering script by @MaximumEntropy :: PR: #4551
- Relative audio path in speech data explorer by @anteju :: PR: #4570
- Dividing generative question-answering CI tests by @ameyasm1154 :: PR: #4600
- Updating the default parameters in the example adapters config file by @shan18 :: PR: #4607
- Improve normalize_batch ValueError message by @piraka9011 :: PR: #4614
- Support listing Hugging Face model info by @titu1994 :: PR: #4619
- Update diarization data loader to train meeting data by @tango4j :: PR: #4567
- Fix HF check for model card info by @titu1994 :: PR: #4628
- Add Github Action for auto webpage build by @titu1994 :: PR: #4645
- Empty commit by @titu1994 :: PR: #4646
- Force git config for doc build by @titu1994 :: PR: #4647
- Correct branch name for github page source by @titu1994 :: PR: #4648
- Adding lang id to shard by @bmwshop :: PR: #4649
- Fix special tokens in vocab to arguments of constructor by @gwarmstrong :: PR: #4631
- Fix apex for r1.11 by @michalivne :: PR: #4666
- Update readme by @nithinraok :: PR: #4667
- Removed trailing spaces in CI test by @vadam5 :: PR: #4671
- Pynini dependency fix by @ekmb :: PR: #4674
- Fix for incorrect batch size issue while decoding by @rilango :: PR: #4675
- Fix to fetch config file by @nithinraok :: PR: #4699
- Fix notebook for buffered inference by @titu1994 :: PR: #4703
- Prompt Learning Notebook Bug Fix by @vadam5 :: PR: #4689
- Add psutils to mock imports by @ericharper :: PR: #4728
- Update Aligner model and tutorial to add NGC checkpoint loading by @redoctopus :: PR: #4714
- Updated docs and doc paths by @vadam5 :: PR: #4754
- Update r1.11 to new heteronyms list by @redoctopus :: PR: #4745
- Update CMUdict with more recent 0.7b entries by @redoctopus :: PR: #4768
- Add pynini to Docker container by @artbataev :: PR: #4733
- Fix tutorial formatting by @redoctopus :: PR: #4778
- Fix initializing weights from ptl ckpt with exclude by @sam1373 :: PR: #4807
- T5 prompt learning fixes by @MaximumEntropy :: PR: #4771
- Updated inference code and squad scripts by @vadam5 :: PR: #4835
- Fix uppercasing mismatch for IPA heteronyms by @redoctopus :: PR: #4860
- Set the number of workers to 0 for validation and test sets in all enc-dec models by @MaximumEntropy :: PR: #4790
- Fix mha by @yzhang123 :: PR: #4866
- ipa bug fix by @ekmb :: PR: #4871
- Added utf8 encoding by @vadam5 :: PR: #4892
- Fix question answering docs r1p11 by @Zhilin123 :: PR: #4897