Skip to content

Releases: NVIDIA/NeMo

NVIDIA Neural Modules 1.18.0

12 May 17:49
Compare
Choose a tag to compare

Highlights

Models

NeMo ASR

  • Hybrid Autoregressive Transducer (HAT) #6260
  • Apple MPS Support for ASR Inference #6289
  • InterCTC Support for Hybrid ASR Models #6215
  • RNNT N-Gram Fusion with mAES algo #6118
  • ASR + Apple M2 CPU/GPU MPS #6289

NeMo TTS

  • TTS directory structure refactor
  • User-set symbol vocabulary #6172

NeMo Megatron

  • Model parallelism from Megatron Core #6393
  • Continued training for P-tuning #6273
  • SFT for GPT-3 #6210
  • Tensor and pipeline model parallel conversion #6218
  • Megatron NMT Export to Riva

NeMo Core

Detailed Changelogs

ASR

Changelog

TTS

Changelog

NLP / NMT

Changelog

Export

Changelog

Bugfixes

Changelog
  • Fix the GPT SFT datasets loss mask bug by @yidong72 :: PR: #6409
  • [BugFix] Fix multi-processing bug in data simulator by @tango4j :: PR: #6310
  • Fix cache aware hybrid bugs by @VahidooX :: PR: #6466
  • [BugFix] Force _get_batch_preds() to keep logits in decoder timestamp… by @tango4j :: PR: #6500
  • Fixing bug in unsort_tensor by @borisfom :: PR: #6320
  • Bugfix for BF16 grad reductions with distopt by @timmoon10 :: PR: #6340
  • Limit urllib3 version to patch issue with RTD by @aklife97 :: PR: #6568

General improvements

Changelog

NVIDIA Neural Modules 1.17.0

05 Apr 00:10
d3017e4
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • Online Clustering Diarizer
  • High Level Diarization API
  • PyCTC Decode Beam Search Support
  • RNNT Beam Search Alignment Extraction
  • InterCTC Loss
  • AIStore Documentation
  • ASR & AWS Multi-node Integration
  • Convolution Invariant SDR losses

NeMo TTS

NeMo Megatron

  • SqaredReLU, SwiGLU, No-Dropout
  • Rotary Position Embedding
  • Untie word embeddings and output projection

NeMo Core

  • Dynamic freezing of modules during training
  • NeMo Multi-Run Documentation
  • ClearML Logging
  • Early Stopping
  • Experiment Manager Docs Update

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.02

Detailed Changelogs

ASR

Changelog
  • Support Alignment Extraction for all RNNT Beam decoding methods by @titu1994 :: PR: #5925
  • Use module-based k2 import guard by @artbataev :: PR: #6006
  • Default RNNT loss to int64 targets by @titu1994 :: PR: #6011
  • Added documentation section for ASR datasets from AIStore by @anteju :: PR: #6008
  • Change perturb rng for reproducing results easily by @fayejf :: PR: #6042
  • InterCTC loss and stochastic depth implementation by @Kipok :: PR: #6013
  • Add pyctcdecode to high level beam search API by @titu1994 :: PR: #6026
  • Convert esperanto into a notebook by @SeanNaren :: PR: #6070
  • [ASR] Added a script for evaluating metrics for audio-to-audio by @anteju :: PR: #5971
  • [ASR] Convolution-invariant SDR loss + unit tests by @anteju :: PR: #5992
  • Adjust stochastic depth dropout probability calculation by @anteju :: PR: #6120
  • Add file class based inference API for diarization by @SeanNaren :: PR: #5945
  • Ngram by @karpnv :: PR: #6063
  • remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
  • Streaming conformer CTC export by @messiaen :: PR: #5837
  • [TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155
  • Ngram lm fusion for RNNT maes decoding by @andrusenkoau :: PR: #6118
  • ASR Beam search documentation by @titu1994 :: PR: #6244

TTS

Changelog
  • [TTS][ZH] added new NGC model cards with polyphone disambiguation. by @XuesongYang :: PR: #5940
  • [TTS] deprecate AudioToCharWithPriorAndPitchDataset. by @XuesongYang :: PR: #5959
  • [TTS][G2P] deprecate add_symbols by @XuesongYang :: PR: #5961
  • Added list_available_models by @treacker :: PR: #5967
  • Update Fastpitch energy bug by @blisc :: PR: #5969
  • removed WHATEVER(1) ˌhwʌˈtɛvɚ from scripts/tts_dataset_files/ipa_cmudict-0.7b_nv22.10.txt by @MikyasDesta :: PR: #5869
  • ONNX export for RadTTS by @borisfom :: PR: #5880
  • Add some info about FastPitch SSL model by @redoctopus :: PR: #5994
  • Vits doc by @treacker :: PR: #5989
  • Ragged batching changes for RadTTS, some refactoring by @borisfom :: PR: #6020
  • Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
  • [TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
  • [TTS] Add Spanish IPA dictionaries and heteronyms by @rlangman :: PR: #6037
  • [TTS] Separate TTS tokenization and g2p util to fix circular import by @rlangman :: PR: #6080
  • [TTS][refactor] Part 7 - move module from model file. by @XuesongYang :: PR: #6098
  • [TTS][refactor] Part 1 - nemo.collections.tts.data by @XuesongYang :: PR: #6099
  • [TTS][refactor] Part 2 - nemo.colletions.tts.parts by @XuesongYang :: PR: #6105
  • [TTS][refactor] Part 6 - remove nemo.collections.tts.torch.README.md and tts_dataset.yaml by @XuesongYang :: PR: #6103
  • [TTS][refactor] Part 3 - nemo.collections.tts.g2p.models by @XuesongYang :: PR: #6113
  • [TTS] update German NGC models trained on Thorsten Datasets by @XuesongYang :: PR: #6125
  • [TTS] remove old waveglow model that relies on torch_stft. by @XuesongYang :: PR: #6128
  • [TTS] Move Spanish polyphones from heteronym to dictionary by @rlangman :: PR: #6123
  • [TTS][refactor] Part 8 - added model inference tests to safeguard changes. by @XuesongYang :: PR: #6129
  • remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
  • [TTS][refactor] update tutorial import paths. by @XuesongYang :: PR: #6176
  • [TTS] Add univnet scheduler by @ArtyomZemlyak :: PR: #6157
  • [TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog
  • [TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982

Export

Changelog

Bugfixes

Changelog
Read more

NVIDIA Neural Modules 1.16.0

08 Mar 04:35
1631118
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • ASR Evaluator
  • Multi-channel dereverberation algorithm
  • Hybrid ASR-TTS Models
  • Flashlight Decoder Beam Search
  • FastConformer Encoder with 8x subsampling

NeMo TTS

  • SSL Voice Conversion
  • Spectrogram Enhancer
  • VITS

NeMo Megatron

  • Per microbatch dataloader for GPT and BERT
  • Adapters compatible with Faster Transformer

NeMo Core

  • Nested model support

NeMo Tools

  • NeMo Forced Aligner

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.01

ASR

Changelog

TTS

Changelog
  • [TTS] Update Spanish TTS model to 1.15 by @rlangman :: PR: #5742
  • [TTS][DE] refine grapheme-based tokenizer and fastpitch training recipe on thorsten's neutral datasets. by @XuesongYang :: PR: #5753
  • No-script TS export, prepared for ONNX export by @borisfom :: PR: #5653
  • Fixing masking in RadTTS bottleneck layer by @borisfom :: PR: #5771
  • Port Riva's mel cepstral distortion w/ dynamic time warping notebook by @redoctopus :: PR: #5778
  • Update radtts' infer path by @blisc :: PR: #5788
  • [TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805
  • [TTS] porting VITS implementation by @treacker :: PR: #5600
  • [TTS][DE] updated IPA dictionary and heteronyms by @XuesongYang :: PR: #5860
  • [TTS] GAN-based spectrogram enhancer by @racoiaws :: PR: #5565
  • TTS inference with Heteronym classification model, hc model inference refactoring by @ekmb :: PR: #5768
  • Remove MCD_DTW tarball by @redoctopus :: PR: #5889
  • Hybrid ASR-TTS models by @artbataev :: PR: #5659
  • Moved eval notebook data to aws by @redoctopus :: PR: #5911
  • [G2P] fixed typos and broken import library. by @XuesongYang :: PR: #5978
  • [G2P] backward compatibility for english tokenizer and bugfix by @XuesongYang :: PR: #5980
  • fix links, add missing file by @ekmb :: PR: #6044
  • [TTS] Spectrogram Enhancer: correct dim for length when loading data by @racoiaws :: PR: #6048
  • [TTS] bugfix for fastpitch German tutorial by @XuesongYang :: PR: #6051
  • [TTS] bugfix Chinese Fastpitch tutorial by @XuesongYang :: PR: #6055
  • Fix enhancer usage by @artbataev :: PR: #6059
  • [TTS] Spectrogram Enhancer: support arbitrary input length by @racoiaws :: PR: #6060
  • Fix enhancer usage in ASR-TTS examples by @artbataev :: PR: #6116
  • [TTS] Spectrogram Enhancer: add option to zero out the initial tensor by @racoiaws :: PR: #6136
  • [TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805

NLP / NMT

Changelog
  • Fix P-Tuning Truncation by @vadam5 :: PR: #5663
  • Adithyare/prompt learning seed by @arendu :: PR: #5749
  • Add extra data args to support proper finetuning of HF converted T5 checkpoints by @MaximumEntropy :: PR: #5719
  • Don't add output directory twice when creating shared sentencepiece tokenizer by @pks :: PR: #5737
  • add constraint info on batch size for tar dataset by @yzhang123 :: PR: #5812
  • remove transformer version upper bound by @Zhilin123 :: PR: #5831
  • Adithyare/adapter new placement by @arendu :: PR: #5791
  • Add SSL import functionality for Audio Lexical PNC Models by @trias702 :: PR: #5834
  • validation batch sizing and drop_last controls by @arendu :: PR: #5830
  • Remove ending newlines when encoding strings w/ sentencepiece tokenizer by @pks :: PR: #5739
  • Fix segmenting for pcla inference by @jubick1337 :: PR: #5849
  • RETRO model finetuning by @yidong72 :: PR: #5800
  • Optimizing distributed Adam when running with one work queue by @timmoon10 :: PR: #5560
  • Add option to disable distributed parameters in distributed Adam optimizer by @timmoon10 :: PR: #5685
  • set max_steps for lr decay through config by @anmolgupt :: PR: #5780
  • Fix Prompt text space issue by @aklife97 :: PR: #5983
  • Add batch_size to prompt_learning generate by @aklife97 :: PR: #6091

NeMo Tools

Changelog

Export

Changelog

General Improvements

Changelog

NVIDIA Neural Modules 1.15.0

02 Feb 00:49
8c785ec
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • HybridTransducer-CTC ASR
  • Greedy timestamp decoding with inference script
  • MHA adapters
  • Conformer local attention (longformer)
  • High level beam search API
  • Multiblank transducer
  • Multi-channel audio processing model
  • AIstore for ASR datasets

NeMo Megatron

  • ALiBi position embeddings support for T5

NeMo TTS

  • Chinese TTS pipeline with polyphone disambiguation

NeMo Core

  • Optimizer based EMA
  • MLFlow logger support

Models

  • stt_eo_conformer_ctc_large (HF, NGC) Esperanto ASR model.
  • stt_eo_conformer_transducer_large (HF, NGC) Esperanto ASR model.

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.12

ASR

Changelog

TTS

Changelog
  • Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
  • [TTS] fix ranges of char set for accented letters. by @XuesongYang :: PR: #5607
  • [TTS] add type hints and change varialbe names for tokenizers and g2p by @XuesongYang :: PR: #5602
  • Fixed RadTTS unit test by @borisfom :: PR: #5572
  • [TTS][ZH] Disambiguate polyphones with augmented dict and Jieba segmenter for Chinese FastPitch by @yuekaizhang :: PR: #5541
  • Add duration padding support for RADTTS inference by @kevjshih :: PR: #5650
  • [TTS] add tts dict cust notebook by @ekmb :: PR: #5662
  • [TN/TTS docs] TN customization, g2p docs moved to tts by @ekmb :: PR: #5683
  • typo and link fixed by @ekmb :: PR: #5741
  • link fixed by @ekmb :: PR: #5745
  • Update Tacotron2 NGC checkpoint load to latest version by @redoctopus :: PR: #5760
  • Docs g2p update by @ekmb :: PR: #5769
  • [TTS][ZH] bugfix import jieba errors. by @XuesongYang :: PR: #5776

NLP / NMT

Changelog

Export

Changelog
  • Add keep_initializers_as_inputs to _export method by @pks :: PR: #5731
  • Megatron export triton update by @Davood-M :: PR: #5766

General Improvements

Changelog

NVIDIA Neural Modules 1.14.0

24 Dec 02:49
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • Hybrid CTC + Transducer loss ASR #5364
  • Sampled Softmax RNNT (Enables large vocab RNNT, for speech translation and multilingual ASR) #5216
  • ASR Adapters hyper parameter search scripts #5159
  • RNNT {ONNX, TorchScript} x GPU export infer #5248
  • Exportable MelSpectrogram (TorchScript) #5512
  • Audio To Audio Dataset Processor #5196
  • Multi Channel Audio Transcription #5479
  • Silence Augmentation #5476

NeMo Megatron

  • Support for the Mixture of Experts for T5
  • Fix PTL model size output for GPT-3 and BERT
  • BERT with Tensor Parallelism & Pipeline Parallel Support

NeMo Core

  • Hydra Multirun core support + NeMo HP optim in YAML #5159

NeMo Models

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.11

ASR

Changelog
  • [Tools][ASR] Tool for generating data using simulated RIRs by @anteju :: PR: #5158
  • Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
  • Add Gradio App to ASR Docs by @titu1994 :: PR: #5270
  • Add support for Sampled Softmax for RNNT Joint by @titu1994 :: PR: #5216
  • Speed up HF data processing script for ASR by @titu1994 :: PR: #5330
  • bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
  • Add cpWER for evaluation of ASR with diarization by @tango4j :: PR: #5279
  • Fix for getting tokenizer in character-based ASR models when using tarred dataset by @jonghwanhyeon :: PR: #5442
  • Refactor/unify ASR offline and buffered inference by @fayejf :: PR: #5440
  • Standalone diarization+ASR evaluation script by @tango4j :: PR: #5439
  • [ASR] Transcribe for multi-channel signals by @anteju :: PR: #5479
  • Add Silence Augmentation by @fayejf :: PR: #5476
  • add exportable mel spec by @1-800-BAD-CODE :: PR: #5512
  • add RNN-T loss implemented by PyTorch and test code by @hainan-xv :: PR: #5312
  • [ASR] AudioToAudio datasets and related test by @anteju :: PR: #5196
  • Add StreamingFeatureBufferer class for real-life streaming decoding by @tango4j :: PR: #5534
  • Pool stats with padding by @1-800-BAD-CODE :: PR: #5403
  • Adding Hybrid RNNT-CTC model by @VahidooX :: PR: #5364
  • Fix ASR Buffered inference scripts by @titu1994 :: PR: #5552
  • Add wer details - insertion, deletion, substitution rate by @fayejf :: PR: #5557
  • Add support for Time Stamp calculation using transcribe_speech.py by @titu1994 :: PR: #5568
  • [STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models by @andrusenkoau :: PR: #5639

TTS

Changelog
  • [TTS] Fastpitch energy condition and refactoring by @subhankar-ghosh :: PR: #5218
  • [TTS] HiFi-TTS Download Script by @oleksiivolk :: PR: #5241
  • [TTS] Add Mandarin/English Bilingual Recipe for Training Fastpitch Models by @yuekaizhang :: PR: #5208
  • [TTS] fixed type of filepath and rename openslr. by @XuesongYang :: PR: #5276
  • [TTS] replace obsolete torch_tts unit test marker with run_only_on('CPU') by @XuesongYang :: PR: #5307
  • [TTS] bugfix IPAG2P and refactor to remove duplicate process. by @XuesongYang :: PR: #5304
  • Update path to get_data.py in TTS tutorial by @redoctopus :: PR: #5311
  • [TTS] Replace IPA lambda arguments with locale string by @rlangman :: PR: #5298
  • [TTS] expand to support flexible dictionary entry formats in IPAG2P. by @XuesongYang :: PR: #5318
  • [TTS] update organization of model checkpoints and their pointers. by @XuesongYang :: PR: #5327
  • [TTS] bugfix for the script of generating mels from fastpitch. by @XuesongYang :: PR: #5344
  • [TTS] Add Spanish model documentation by @rlangman :: PR: #5390
  • [TTS] Add Spanish FastPitch training configs by @rlangman :: PR: #5383
  • [TTS] replace pitch normalization params with ??? by @XuesongYang :: PR: #5392
  • [TTS] Create script for processing TTS training audio by @rlangman :: PR: #5262
  • [TTS] remove useless logic for set_tokenizer. by @XuesongYang :: PR: #5430
  • [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue by @borisfom :: PR: #5358
  • JOC Optimization in FastPitch by @subhankar-ghosh :: PR: #5450
  • [TTS] Support speaker level pitch normalization by @rlangman :: PR: #5455
  • TTS tutorial update: use speaker 9017 instead of 6097 by @redoctopus :: PR: #5532
  • [TTS] Remove unused TTS eval function by @redoctopus :: PR: #5605
  • [TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. by @XuesongYang :: PR: #5596
  • [TTS][DOC] add notes about automatic conversion to target sampling ra… by @XuesongYang :: PR: #5624
  • [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
  • [TTS][ZH] bugfix for ngc cli installation. by @XuesongYang :: PR: #5652
  • [TTS][ZH] fix broken link for the script. by @XuesongYang :: PR: #5666

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog
  • [ITN] fix year date graph, cardinals extension for hundreds by @ekmb :: PR: #5435
  • [TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414

Export

Changelog

General Improvements

Changelog
Read more

NVIDIA Neural Modules 1.13.0

07 Dec 21:14
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • Spoken Language Understanding (SLU) models based on Conformer encoder and transformer decoder
  • Support for codeswitched manifests during training
  • Support for Language ID during inference for ML models
  • Support of cache-aware streaming for offline models
  • Word confidence estimation for CTC & RNNT greedy decoding

NeMo Megatron

  • Interleaved Pipeline schedule
  • Transformer Engine for GPT
  • HF T5v1.1 -> NeMo-Megatron conversion and finetuning/p-tuning
  • IA3 and Adapter Tuning (Tensor + Pipeline Parallel)
  • Pipeline Parallel Support for T5 Prompt Learning
  • MegatronNMT export

NeMo TTS

  • TTS introductory tutorial
  • Phonemizer/espeak removal (Spanish/German)
  • Char-only support for Spanish/German models
  • Documentation Refactor

NeMo Core

  • Upgrade to NGC PyTorch 22.09 container
  • Add pre-commit hooks
  • Exponential moving average (EMA) of weights during training

NeMo Models

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.09

Known Issues

Issues
  • pytest for RadTTSModel_export_to_torchscript are failing intermittently due to random input values. Fixed in main.

ASR

Changelog

TTS

Changelog

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog

NeMo Tools

Changelog

Export

Changelog
  • Fix export bug by @VahidooX :: PR: #5009
  • RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
  • Support TorchScript export for Squeezeformer by @titu1994 :: PR: #5164
  • Expose keep_initializers_as_inputs to Exportable class by @pks :: PR: #5052
  • Fix the self-attention export bug for cache-aware streaming Conformer by @VahidooX :: PR: #5114
  • replace ColumnParallelLinear with nn.Linear in export_utils by @arendu :: PR: #5217
  • Megatron Export Update by @Davood-M :: PR: #5343
  • Fix Conformer Export in 1.13.0 (cherry-pick from main) by @artbataev :: PR: #5446
  • export_utils bugfix by @Davood-M :: PR: #5480
  • Export fixes for Riva by @borisfom :: PR: #5496

General Improvements and Bugfixes

Changelog
Read more

NVIDIA Neural Modules 1.12.0

10 Oct 22:11
dd9a30f
Compare
Choose a tag to compare

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.08

ASR

Changelog

TTS

Changelog
  • [TTS] use consistent spline interpolation for fastpitch and hifigan. by @XuesongYang :: PR: #4679
  • TTS tokenizers moved to collections.common.tokenizers by @AlexGrinch :: PR: #4690
  • [TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
  • ARP to IPA mapping, g2p_encode for IPATokenizer by @ekmb :: PR: #4850
  • IPA G2P bugfixes by @redoctopus :: PR: #4869
  • [TTS] add missing WikiHomograph data entries to CMUdict, updates to match new ipa set by @ekmb :: PR: #4886
  • [TTS] fix wrong g2p path. by @XuesongYang :: PR: #4902
  • [TTS] FastPitch training: speed up align_prior_matrix calculation by @racoiaws :: PR: #4718
  • [TTS] fix broken tutorial for MixerTTS. by @XuesongYang :: PR: #4949
  • [TTS] bugfix 'EnglishPhonemesTokenizer' object has no attribute 'encode_from_g2p' by @XuesongYang :: PR: #4992
  • [TTS] added missing German phoneme tokenizer by @XuesongYang :: PR: #5070
  • [TTS] fixed wrong val loss for epoch 0 and inconsistent metrics names by @XuesongYang :: PR: #5087

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog
  • [TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
  • [Chinese text normalization]Chinese TN part in text_normalization by @mzxcpp :: PR: #4826
  • Fix zh tn by @yzhang123 :: PR: #5035
  • Bug fixes for parallel mp3 to wav conversion, PC notebook, update Readme for TN requirements by @ekmb :: PR: #5047
  • Added P&C lexical audio model by @jubick1337 :: PR: #4802

Export

Changelog

General Improvements

Changelog

NVIDIA Neural Modules 1.11.0

08 Sep 17:06
Compare
Choose a tag to compare

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.07

ASR

Changelog
  • Add ASR CTC Decoding module by @titu1994 :: PR: #4342
  • Fixing bugs in calling method ctc_decoder_predictions_tensor. by @VahidooX :: PR: #4414
  • Fixed WER initialization in ASR_with_Nemo notebook by @anteju :: PR: #4523
  • Update signature of Hypothesis alignments by @titu1994 :: PR: #4511
  • Add support for ASR Adapter Auxiliary Losses by @titu1994 :: PR: #4480
  • Catalan ASR NGC Resource by @stevehuang52 :: PR: #4576
  • Add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
  • Add DALI char dataset support to SSL model by @piraka9011 :: PR: #4592
  • Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
  • Update Offline ASR with CTC Decoding by @titu1994 :: PR: #4608
  • Add Squeezeformer to ASR by @titu1994 :: PR: #4416
  • Fix ASR notebooks by @titu1994 :: PR: #4738
  • Add pretrained ASR models for Croatian by @anteju :: PR: #4682
  • Dataloader, collector, loss and metric for multiscale diarization decoder by @tango4j :: PR: #4187
  • Multilingual VAD model by @fayejf :: PR: #4734
  • Adding support for models trained with full context for cache-aware streaming. by @VahidooX :: PR: #4687
  • Fp16 support for Conformer by @bmwshop :: PR: #4571
  • Tiny VAD refactoring for postprocessing by @fayejf :: PR: #4625
  • Add silence handling for speaker diarization pipeline by @nithinraok :: PR: #4512
  • Add Bucketing support to TarredAudioToClassificationLabelDataset by @entn-at :: PR: #4465

TTS

Changelog

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog

Export

Changelog

Bugfixes

Changelog
  • Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
  • Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
  • Fix tarred dataset len when num shards is not divisible by workers by @itzsimpl :: PR: #4553
  • Fix multiple dev/test datasets after restoring from checkpoint by @PeganovAnton :: PR: #4636
  • Fix/need different cache dirs for different datasets by @PeganovAnton :: PR: #4640
  • Improve mAES algorithm with patches by @titu1994 :: PR: #4662

General Improvements

Changelog
Read more

NVIDIA Neural Modules 1.10.0

01 Jul 22:14
Compare
Choose a tag to compare

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.05

Known Issues

Issues
  • Tutorial: Fastpitch_Training_GermanTTS.ipynb is experimental and still being tested.

ASR

Changelog

TTS

Changelog

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog
  • [TN] WFST to normalize punctuation by @ekmb :: PR: #4108
  • [TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323
  • Tn tutorial by @yzhang123 :: PR: #4090
  • [TN] WFST to normalize punctuation by @ekmb :: PR: #4108
  • Tn add rules by @yzhang123 :: PR: #4302
  • [TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323
  • Tn install by @yzhang123 :: PR: #4055
  • Fix electronic bug, new time ITN rule by @ekmb :: PR: #4355
  • [TN] Bug fix: expand serial coverage of unknown symbol, remove constraints from word graph by @ekmb :: PR: #4463
  • Configure T5 finetuning metrics by @MaximumEntropy :: PR: #4122

Export

Changelog

Core

Changelog

General Improvements and Fixes

Changelog
Read more

NVIDIA Neural Modules 1.9.0

03 Jun 20:40
Compare
Choose a tag to compare

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.04

ASR

Changelog
  • Fix changed function name in offline vad asr notebeook by @fayejf :: PR: #4007
  • NeMo Adapters Support + ASR Adapters by @titu1994 :: PR: #3942
  • Update ASR configs with num_workers and pin_memory by @titu1994 :: PR: #4270
  • Verbose k2 install, skip if failed by @GNroy :: PR: #4289
  • Torch conversion for VAD-Diarization pipeline by @tango4j :: PR: #3930
  • Multiprocess improvements by @nithinraok :: PR: #4127

TTS

Changelog

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog

NeMo Tools

Changelog
  • Added exception handling for audio player in SDE by @vsl9 :: PR: #4077

NeMo Core

Changelog
  • Support pre-extracted nemo checkpoint for restoration by @titu1994 :: PR: #4061
  • Fix type checking to be compatible with named tuples by @artbataev :: PR: #3986
  • Update num worker calculation due to PTL flag changes by @redoctopus :: PR: #4056
  • Refresh NeMo documentation to Sphinx Book Theme by @titu1994 :: PR: #3996
  • Generalize adapter merge strategy for future adapters by @titu1994 :: PR: #4091

General Improvements

Changelog