NVIDIA Neural Modules 1.18.0
Highlights
Models
- GPT-2B-001, trained on 1.1T tokens with 4K sequence length.
- STT En Fast Conformer-CTC Large
- STT En Fast Conformer-Transducer Large
- STT En Fast Conformer-Transducer Large LibriSpeech
- STT En FastConformer Hybrid Transducer-CTC Large P&C
- STT De FastConformer Hybrid Transducer-CTC Large P&C
- STT Es FastConformer Hybrid Transducer-CTC Large P&C
- STT It FastConformer Hybrid Transducer-CTC Large P&C
- STT Pl FastConformer Hybrid Transducer-CTC Large P&C
- STT Ua FastConformer Hybrid Transducer-CTC Large P&C
- STT Hr FastConformer Hybrid Transducer-CTC Large P&C
- STT By Conformer-RNNT Large
NeMo ASR
- Hybrid Autoregressive Transducer (HAT) #6260
- Apple MPS Support for ASR Inference #6289
- InterCTC Support for Hybrid ASR Models #6215
- RNNT N-Gram Fusion with mAES algo #6118
- ASR + Apple M2 CPU/GPU MPS #6289
NeMo TTS
- TTS directory structure refactor
- User-set symbol vocabulary #6172
NeMo Megatron
- Model parallelism from Megatron Core #6393
- Continued training for P-tuning #6273
- SFT for GPT-3 #6210
- Tensor and pipeline model parallel conversion #6218
- Megatron NMT Export to Riva
NeMo Core
Detailed Changelogs
ASR
Changelog
- minor cleanup by @messiaen :: PR: #6311
- docs on the use of heterogeneous test / val manifests by @bmwshop :: PR: #6352
- [WIP] add buffered chunked streaming for nemo force aligner by @Slyne :: PR: #6185
- Word boosting for Flashlight decoder by @trias702 :: PR: #6367
- Add installation and ASR inference instructions for Mac by @artbataev :: PR: #6377
- specaug speedup by @1-800-BAD-CODE :: PR: #6347
- updated lr for FC configs by @bmwshop :: PR: #6379
- Make possible to control tqdm progress bar in ASR models by @SN4KEBYTE :: PR: #6375
- [ASR] Conformer global tokens in local attention by @sam1373 :: PR: #6253
- fixed torch warning on using a list of numpy arrays by @MKNachesa :: PR: #6382
- Fix FastConformer config: correct bucketing strategy by @artbataev :: PR: #6413
- fixing the ability to use temp sampling with concat datasets by @bmwshop :: PR: #6423
- add conformer configs for hat model by @andrusenkoau :: PR: #6372
- [ASR] Add optimization util for linear sum assignment algorithm by @tango4j :: PR: #6349
- Added/updated new Conformer configs by @VahidooX :: PR: #6426
- Fix typos by @titu1994 :: PR: #6494
- Fix typos (#6523) by @titu1994 :: PR: #6539
- added back the fast emit section to the configs. by @VahidooX :: PR: #6540
- Add FastConformer Hybrid ASR models for EN, ES, IT, DE, PL, HR, UA, BY by @KunalDhawan :: PR: #6549
- Add scores for FastConformer models by @titu1994 :: PR: #6557
- Patch transcribe and support offline transcribe for hybrid model by @fayejf :: PR: #6550
- More streaming conformer export fixes by @messiaen :: PR: #6567
- Documentation for ASR-TTS models by @artbataev :: PR: #6594
- Patch transcribe_util for steaming mode and add wer calculation back to inference scripts by @fayejf :: PR: #6601
- Add HAT image to docs by @andrusenkoau :: PR: #6619
- Patch decoding for PC models by @titu1994 :: PR: #6630
- Fix wer.py where 'errors' variable was not set by @stevehuang52 :: PR: #6633
- Fix for old models in change_attention_model by @VahidooX :: PR: #6635
TTS
Changelog
NLP / NMT
Changelog
- [Core] return_config=True now extracts just config, not full tarfile by @titu1994 :: PR: #6346
- restore path for p-tuning by @arendu :: PR: #6273
- taskname and early stopping for adapters by @arendu :: PR: #6366
- Adapter tuning accepts expanded language model dir by @arendu :: PR: #6376
- Update gpt_training.rst by @blisc :: PR: #6378
- Megatron GPT model finetuning by @MaximumEntropy :: PR: #6210
- [NeMo Megatron] Cleanup configs to infer the models TP PP config automatically by @titu1994 :: PR: #6368
- Fix prompt template unescaping by @MaximumEntropy :: PR: #6399
- Add support for Megatron GPT Untied Embd TP PP Change by @titu1994 :: PR: #6388
- Move Parallelism usage from Apex -> Megatron Core by @aklife97 :: PR: #6393
- Add ability to enable/disable act ckpt and seq parallelism in GPT by @markelsanz14 :: PR: #6327
- Refactor PP conversion + add support for TP only conversion by @titu1994 :: PR: #6419
- fix CPU overheads of GPT synthetic dataset by @xrennvidia :: PR: #6427
- check if grad is none before calling all_reduce by @arendu :: PR: #6428
- Fix replace_bos_with_pad not found by @aklife97 :: PR: #6443
- Support Swiglu in TP PP Conversion by @titu1994 :: PR: #6437
- BERT pre-training mp fork to spawn by @aklife97 :: PR: #6442
- Meagtron encoder decoder fix for empty validation outputs by @michalivne :: PR: #6459
- Reduce workers on NMT CI by @aklife97 :: PR: #6472
- Switch to NVIDIA Megatron repo by @aklife97 :: PR: #6465
- Megatron KERPLE positional embeddings by @michalivne :: PR: #6478
- Support in external sample mapping for Megatron datasets by @michalivne :: PR: #6462
- Fix custom by @aklife97 :: PR: #6512
- GPT fp16 inference fix by @MaximumEntropy :: PR: #6543
- Fix for T5 FT model by @aklife97 :: PR: #6529
- Pass instead of scaler object to core by @aklife97 :: PR: #6545
- Change Megatron Enc Dec model to use persistent_workers by @aklife97 :: PR: #6548
- Turn autocast off when precision is fp32 by @aklife97 :: PR: #6554
- Fix batch size reconf for T5 FT for multi-validation by @aklife97 :: PR: #6582
- Make tensor split contiguous for qkv and kv in attention by @aklife97 :: PR: #6580
- Patches from main to r1.18.0 for Virtual Parallel by @titu1994 :: PR: #6592
- Create dummy iters to satisy iter type len checks in core + update core commit by @aklife97 :: PR: #6600
- Restore GPT support for interleaved pipeline parallelism by @timmoon10 :: PR: #6528
- Add megatron_core to requirements by @ericharper :: PR: #6639
Export
Changelog
Bugfixes
Changelog
- Fix the GPT SFT datasets loss mask bug by @yidong72 :: PR: #6409
- [BugFix] Fix multi-processing bug in data simulator by @tango4j :: PR: #6310
- Fix cache aware hybrid bugs by @VahidooX :: PR: #6466
- [BugFix] Force _get_batch_preds() to keep logits in decoder timestamp… by @tango4j :: PR: #6500
- Fixing bug in unsort_tensor by @borisfom :: PR: #6320
- Bugfix for BF16 grad reductions with distopt by @timmoon10 :: PR: #6340
- Limit urllib3 version to patch issue with RTD by @aklife97 :: PR: #6568
General improvements
Changelog
- Pin the version to hopefully fix rtd build by @SeanNaren :: PR: #6334
- enabling diverse datasets in val / test by @bmwshop :: PR: #6306
- extract inference weights by @arendu :: PR: #6353
- Add opengraph support for NeMo docs by @titu1994 :: PR: #6380
- Adding basic preemption code by @athitten :: PR: #6161
- Add documentation for preemption support by @athitten :: PR: #6403
- Update hyperparameter recommendation based on experiments by @Zhilin123 :: PR: #6405
- exceptions with empty test / val ds config sections by @bmwshop :: PR: #6421
- Upgrade pt 23.03 by @ericharper :: PR: #6430
- Update README to add core installation by @aklife97 :: PR: #6488
- Not doing CastToFloat by default by @borisfom :: PR: #6524
- Update manifest.py for speedup by @stevehuang52 :: PR: #6565
- Update SDP docs by @erastorgueva-nv :: PR: #6485
- Update core commit hash in readme by @aklife97 :: PR: #6622
- Remove from jenkins by @ericharper :: PR: #6641
- Remove dup by @ericharper :: PR: #6643