Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.08

ASR

Changelog

Add support for RNNT Char/Word Timestamp Calculation by @titu1994 :: PR: #4665
add conditional logic to rnnt_wer to handle when arrays have no elements by @mgoldey :: PR: #4776
fix handling of the final word for rnnt word timestamps by @mgoldey :: PR: #4779
amend rnnt word timestamps by @mgoldey :: PR: #4782
fix type error in rnnt_wer.py, rnnt_wer_bpe.py, wer_bpe.py by @hainan-xv :: PR: #4822
add kab language asr models by @nithinraok :: PR: #4819
[Tutorial][ASR][Fix] Data paths in ASR with NeMo tutorial by @anteju :: PR: #4845
[ASR] Fix for multi-channel signals in AudioSegment by @anteju :: PR: #4824
[ASR] Generate multichannel noise by @anteju :: PR: #4870
Fix asr model order by @nithinraok :: PR: #4959
Fix ASR issues by @titu1994 :: PR: #4984
Fix diarization ASR inference link in notebook by @SeanNaren :: PR: #5016
Code switching by @KunalDhawan :: PR: #4784
Release SOTA Lang ID model by @fayejf :: PR: #5080
Stateless decoder for RNN-T by @hainan-xv :: PR: #4710

TTS

Changelog

[TTS] use consistent spline interpolation for fastpitch and hifigan. by @XuesongYang :: PR: #4679
TTS tokenizers moved to collections.common.tokenizers by @AlexGrinch :: PR: #4690
[TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
ARP to IPA mapping, g2p_encode for IPATokenizer by @ekmb :: PR: #4850
IPA G2P bugfixes by @redoctopus :: PR: #4869
[TTS] add missing WikiHomograph data entries to CMUdict, updates to match new ipa set by @ekmb :: PR: #4886
[TTS] fix wrong g2p path. by @XuesongYang :: PR: #4902
[TTS] FastPitch training: speed up align_prior_matrix calculation by @racoiaws :: PR: #4718
[TTS] fix broken tutorial for MixerTTS. by @XuesongYang :: PR: #4949
[TTS] bugfix 'EnglishPhonemesTokenizer' object has no attribute 'encode_from_g2p' by @XuesongYang :: PR: #4992
[TTS] added missing German phoneme tokenizer by @XuesongYang :: PR: #5070
[TTS] fixed wrong val loss for epoch 0 and inconsistent metrics names by @XuesongYang :: PR: #5087

NLP / NMT

Changelog

Fix bug intent slot classification tokenizer to dialogue by @Zhilin123 :: PR: #4694
Intent slot model onnx export test by @Zhilin123 :: PR: #4731
Fix megatron p tuning notebook by @nithinraok :: PR: #4741
Add support for Apex distributed Adam optimizer with GPT-3 by @timmoon10 :: PR: #4487
Fixes NLPModel's load from checkpoint due to PTL private function changes by @MaximumEntropy :: PR: #4755
Adapter tuning for Megatron GPT models by @arendu :: PR: #4717
Megatron Encoder Decoder models with RPE and PP > 2 by @MaximumEntropy :: PR: #4663
add kab language asr models by @nithinraok :: PR: #4819
add chinese to language doc and fix bug by @yzhang123 :: PR: #4834
Spoken Language Identification by @fayejf :: PR: #4846
Fix decoding bug for megatron enc-dec models with O2 by @MaximumEntropy :: PR: #4989
Updating Megatron LM conversion according to PTL 1.7 by @Davood-M :: PR: #5038
Adding RETRO model Faiss sharding index and KNN sharding index by @yidong72 :: PR: #4713
MLP Prompt Learning Encoder by @vadam5 :: PR: #4849
Update the prompt learning to handle large lanague model by @yidong72 :: PR: #4906

Text Normalization / Inverse Text Normalization

Changelog

[TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
[Chinese text normalization]Chinese TN part in text_normalization by @mzxcpp :: PR: #4826
Fix zh tn by @yzhang123 :: PR: #5035
Bug fixes for parallel mp3 to wav conversion, PC notebook, update Readme for TN requirements by @ekmb :: PR: #5047
Added P&C lexical audio model by @jubick1337 :: PR: #4802

Export

Changelog

Intent slot model onnx export test by @Zhilin123 :: PR: #4731

General Improvements

Changelog

Fix logger reference by @SeanNaren :: PR: #4786
Fix error with class method reference in msdd by @SeanNaren :: PR: #4865
Add sync for logging calls to ensure aggregation across devices by @SeanNaren :: PR: #4876
Fix saving the last checkpoint when using val check interval by @SeanNaren :: PR: #4905
Add support for skipping validation on resume + extend saving last ckpt test by @SeanNaren :: PR: #4922
Move trainer calls for ssl models to training and validation steps only by @sam1373 :: PR: #4685
Change Num Partitions size expansion fix by @aklife97 :: PR: #4719
upgrade to PTL 1.7 by @nithinraok :: PR: #4672
Fixing outputs of infer() and use of NeMo length regulator helper by @borisfom :: PR: #4724
bug fix: enable async grad reduction when DP > 1 by @erhoo82 :: PR: #4740
Add LayerNorm1P, weight decay for LN and unscaled initialization by @mikolajblaz :: PR: #4743
Data Simulator by @chooper1 :: PR: #4686
jenkins data simulator fix by @nithinraok :: PR: #4751
Mutiscale Diarization Decoder (MSDD) model and module files by @tango4j :: PR: #4650
Fix logging in gradient clipping with PTL 1.7.2 by @MaximumEntropy :: PR: #4769
Fix checkpoint restoring by @nithinraok :: PR: #4777
avoid data clipping after convolution with rir samples by @nithinraok :: PR: #4806
Fixed in_features dim if bidirectional is True by @farisalasmary :: PR: #4588
Fix float/integer type error in WER.update() by @fujimotos :: PR: #4816
[Speech Data Explorer] An option to explicitly specify the base dir by @anteju :: PR: #4678
adding instancenorm as an option for conv normalization by @bmwshop :: PR: #4827
Fix small spelling mistakes by @SeanNaren :: PR: #4839
[Tutorials] Fix matplotlib version and directory name in Multispeaker_Simulator by @anteju :: PR: #4804
Update diarization folder structure by @tango4j :: PR: #4823
Missing types in clustering by @SeanNaren :: PR: #4858
add new models by @Jorjeous :: PR: #4852
Fix decoding for T5 models with RPE by @MaximumEntropy :: PR: #4847
Update Speaker Diarization notebooks with unknown oracle_num_speakers by @fayejf :: PR: #4861
Fix mha bug by @yzhang123 :: PR: #4859
Updates to adapter training by @arendu :: PR: #4842
Changes to MSDD code after review, fix test log call by @SeanNaren :: PR: #4881
Fixed output of BERT to be [batch x seq x hidden] by @michalivne :: PR: #4887
Add AMI dataset script by @SeanNaren :: PR: #4864
Update label_models.py by @stevehuang52 :: PR: #4891
Update tutorials.rst for question answering by @Zhilin123 :: PR: #4895
removed unused imports for all domains. by @XuesongYang :: PR: #4901
Fix ptl_load_state not providing cls by @MaximumEntropy :: PR: #4914
Remove unused cv collection by @okuchaiev :: PR: #4907
Add mixed-representation config to PhonemizerTokenizer by @rlangman :: PR: #4904
Fix implicit bug in _AudioLabelDataset by @stevehuang52 :: PR: #4923
Fix and refactor label models by @fayejf :: PR: #4913
Sparrowhawk deployment fix by @ekmb :: PR: #4928
Upgrade to NGC PyTorch 22.08 Container by @ericharper :: PR: #4929
Fixes for Cherry Picked PRs by @titu1994 :: PR: #4962
Fix cherry pick workflow by @ericharper :: PR: #4964
check for active conda environment by @nithinraok :: PR: #4970
fix label models restoring issue from weighted cross entropy by @nithinraok :: PR: #4968
Add simple pre-commit file (#4983) by @SeanNaren :: PR: #4995
Fix bug in Squeezeformer Conv block by @titu1994 :: PR: #5011
Fix bugs by @Zhilin123 :: PR: #5036
Add black to pre-commit (#5027) by @SeanNaren :: PR: #5045
Fix bug in question answering tutorial by @Zhilin123 :: PR: #5049
Missing fixes from r1.11.0 to T5 finetuning eval by @MaximumEntropy :: PR: #5054
P&C docs by @jubick1337 :: PR: #5068
probabilites -> probabilities by @nithinraok :: PR: #5078
Notebook bug fixes by @vadam5 :: PR: #5084
update strategy in notebook from ddp_fork to dp by @Zhilin123 :: PR: #5088
Fix Unhashable type list for Numba Cuda spec augment kernel by @titu1994 :: PR: #5093
Remove numba import by @titu1994 :: PR: #5095
T5 prompt learning fixes missing from r.11.0 merge by @MaximumEntropy :: PR: #5075
T5 Decoding with PP > 2 fix by @MaximumEntropy :: PR: #5091
Multiprocessing fix by @jubick1337 :: PR: #5106
[Bug fix] PC lexical + audio by @ekmb :: PR: #5109
bugfix: pybtex.database.InvalidNameString: Too many commas in author … by @XuesongYang :: PR: #5112

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA Neural Modules 1.12.0

Container

ASR

TTS

NLP / NMT

Text Normalization / Inverse Text Normalization

Export

General Improvements

Contributors