NVIDIA Neural Modules 1.12.0
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.08
ASR
Changelog
- Add support for RNNT Char/Word Timestamp Calculation by @titu1994 :: PR: #4665
- add conditional logic to rnnt_wer to handle when arrays have no elements by @mgoldey :: PR: #4776
- fix handling of the final word for rnnt word timestamps by @mgoldey :: PR: #4779
- amend rnnt word timestamps by @mgoldey :: PR: #4782
- fix type error in rnnt_wer.py, rnnt_wer_bpe.py, wer_bpe.py by @hainan-xv :: PR: #4822
- add kab language asr models by @nithinraok :: PR: #4819
- [Tutorial][ASR][Fix] Data paths in ASR with NeMo tutorial by @anteju :: PR: #4845
- [ASR] Fix for multi-channel signals in AudioSegment by @anteju :: PR: #4824
- [ASR] Generate multichannel noise by @anteju :: PR: #4870
- Fix asr model order by @nithinraok :: PR: #4959
- Fix ASR issues by @titu1994 :: PR: #4984
- Fix diarization ASR inference link in notebook by @SeanNaren :: PR: #5016
- Code switching by @KunalDhawan :: PR: #4784
- Release SOTA Lang ID model by @fayejf :: PR: #5080
- Stateless decoder for RNN-T by @hainan-xv :: PR: #4710
TTS
Changelog
- [TTS] use consistent spline interpolation for fastpitch and hifigan. by @XuesongYang :: PR: #4679
- TTS tokenizers moved to collections.common.tokenizers by @AlexGrinch :: PR: #4690
- [TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
- ARP to IPA mapping, g2p_encode for IPATokenizer by @ekmb :: PR: #4850
- IPA G2P bugfixes by @redoctopus :: PR: #4869
- [TTS] add missing WikiHomograph data entries to CMUdict, updates to match new ipa set by @ekmb :: PR: #4886
- [TTS] fix wrong g2p path. by @XuesongYang :: PR: #4902
- [TTS] FastPitch training: speed up align_prior_matrix calculation by @racoiaws :: PR: #4718
- [TTS] fix broken tutorial for MixerTTS. by @XuesongYang :: PR: #4949
- [TTS] bugfix 'EnglishPhonemesTokenizer' object has no attribute 'encode_from_g2p' by @XuesongYang :: PR: #4992
- [TTS] added missing German phoneme tokenizer by @XuesongYang :: PR: #5070
- [TTS] fixed wrong val loss for epoch 0 and inconsistent metrics names by @XuesongYang :: PR: #5087
NLP / NMT
Changelog
- Fix bug intent slot classification tokenizer to dialogue by @Zhilin123 :: PR: #4694
- Intent slot model onnx export test by @Zhilin123 :: PR: #4731
- Fix megatron p tuning notebook by @nithinraok :: PR: #4741
- Add support for Apex distributed Adam optimizer with GPT-3 by @timmoon10 :: PR: #4487
- Fixes NLPModel's load from checkpoint due to PTL private function changes by @MaximumEntropy :: PR: #4755
- Adapter tuning for Megatron GPT models by @arendu :: PR: #4717
- Megatron Encoder Decoder models with RPE and PP > 2 by @MaximumEntropy :: PR: #4663
- add kab language asr models by @nithinraok :: PR: #4819
- add chinese to language doc and fix bug by @yzhang123 :: PR: #4834
- Spoken Language Identification by @fayejf :: PR: #4846
- Fix decoding bug for megatron enc-dec models with O2 by @MaximumEntropy :: PR: #4989
- Updating Megatron LM conversion according to PTL 1.7 by @Davood-M :: PR: #5038
- Adding RETRO model Faiss sharding index and KNN sharding index by @yidong72 :: PR: #4713
- MLP Prompt Learning Encoder by @vadam5 :: PR: #4849
- Update the prompt learning to handle large lanague model by @yidong72 :: PR: #4906
Text Normalization / Inverse Text Normalization
Changelog
- [TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
- [Chinese text normalization]Chinese TN part in text_normalization by @mzxcpp :: PR: #4826
- Fix zh tn by @yzhang123 :: PR: #5035
- Bug fixes for parallel mp3 to wav conversion, PC notebook, update Readme for TN requirements by @ekmb :: PR: #5047
- Added P&C lexical audio model by @jubick1337 :: PR: #4802
Export
Changelog
- Intent slot model onnx export test by @Zhilin123 :: PR: #4731
General Improvements
Changelog
-
Fix logger reference by @SeanNaren :: PR: #4786
-
Fix error with class method reference in msdd by @SeanNaren :: PR: #4865
-
Add sync for logging calls to ensure aggregation across devices by @SeanNaren :: PR: #4876
-
Fix saving the last checkpoint when using val check interval by @SeanNaren :: PR: #4905
-
Add support for skipping validation on resume + extend saving last ckpt test by @SeanNaren :: PR: #4922
-
Move trainer calls for ssl models to training and validation steps only by @sam1373 :: PR: #4685
-
Change Num Partitions size expansion fix by @aklife97 :: PR: #4719
-
upgrade to PTL 1.7 by @nithinraok :: PR: #4672
-
Fixing outputs of infer() and use of NeMo length regulator helper by @borisfom :: PR: #4724
-
bug fix: enable async grad reduction when DP > 1 by @erhoo82 :: PR: #4740
-
Add LayerNorm1P, weight decay for LN and unscaled initialization by @mikolajblaz :: PR: #4743
-
jenkins data simulator fix by @nithinraok :: PR: #4751
-
Mutiscale Diarization Decoder (MSDD) model and module files by @tango4j :: PR: #4650
-
Fix logging in gradient clipping with PTL 1.7.2 by @MaximumEntropy :: PR: #4769
-
Fix checkpoint restoring by @nithinraok :: PR: #4777
-
avoid data clipping after convolution with rir samples by @nithinraok :: PR: #4806
-
Fixed in_features dim if bidirectional is True by @farisalasmary :: PR: #4588
-
Fix float/integer type error in WER.update() by @fujimotos :: PR: #4816
-
[Speech Data Explorer] An option to explicitly specify the base dir by @anteju :: PR: #4678
-
adding instancenorm as an option for conv normalization by @bmwshop :: PR: #4827
-
Fix small spelling mistakes by @SeanNaren :: PR: #4839
-
[Tutorials] Fix matplotlib version and directory name in Multispeaker_Simulator by @anteju :: PR: #4804
-
Update diarization folder structure by @tango4j :: PR: #4823
-
Missing types in clustering by @SeanNaren :: PR: #4858
-
Fix decoding for T5 models with RPE by @MaximumEntropy :: PR: #4847
-
Update Speaker Diarization notebooks with unknown oracle_num_speakers by @fayejf :: PR: #4861
-
Fix mha bug by @yzhang123 :: PR: #4859
-
Changes to MSDD code after review, fix test log call by @SeanNaren :: PR: #4881
-
Fixed output of BERT to be [batch x seq x hidden] by @michalivne :: PR: #4887
-
Add AMI dataset script by @SeanNaren :: PR: #4864
-
Update label_models.py by @stevehuang52 :: PR: #4891
-
Update tutorials.rst for question answering by @Zhilin123 :: PR: #4895
-
removed unused imports for all domains. by @XuesongYang :: PR: #4901
-
Fix ptl_load_state not providing cls by @MaximumEntropy :: PR: #4914
-
Remove unused cv collection by @okuchaiev :: PR: #4907
-
Add mixed-representation config to PhonemizerTokenizer by @rlangman :: PR: #4904
-
Fix implicit bug in _AudioLabelDataset by @stevehuang52 :: PR: #4923
-
Upgrade to NGC PyTorch 22.08 Container by @ericharper :: PR: #4929
-
Fix cherry pick workflow by @ericharper :: PR: #4964
-
check for active conda environment by @nithinraok :: PR: #4970
-
fix label models restoring issue from weighted cross entropy by @nithinraok :: PR: #4968
-
Add simple pre-commit file (#4983) by @SeanNaren :: PR: #4995
-
Fix bug in Squeezeformer Conv block by @titu1994 :: PR: #5011
-
Fix bugs by @Zhilin123 :: PR: #5036
-
Add black to pre-commit (#5027) by @SeanNaren :: PR: #5045
-
Fix bug in question answering tutorial by @Zhilin123 :: PR: #5049
-
Missing fixes from r1.11.0 to T5 finetuning eval by @MaximumEntropy :: PR: #5054
-
P&C docs by @jubick1337 :: PR: #5068
-
probabilites -> probabilities by @nithinraok :: PR: #5078
-
update strategy in notebook from ddp_fork to dp by @Zhilin123 :: PR: #5088
-
Fix Unhashable type list for Numba Cuda spec augment kernel by @titu1994 :: PR: #5093
-
T5 prompt learning fixes missing from r.11.0 merge by @MaximumEntropy :: PR: #5075
-
T5 Decoding with PP > 2 fix by @MaximumEntropy :: PR: #5091
-
Multiprocessing fix by @jubick1337 :: PR: #5106
-
bugfix: pybtex.database.InvalidNameString: Too many commas in author … by @XuesongYang :: PR: #5112