Releases: InternLM/InternEvo
Releases · InternLM/InternEvo
InternEvo-v0.5.2dev20240525
What's Changed
- fix(moe): fix interface for megablock by @blankde in #233
- feat(npu): support npu fused adamw by @SolenoidWGT in #188
- feat(npu): support npu fusion rotary mul by @SolenoidWGT in #187
- Feat(RMSNorm NPU): Add RMSNormNPU and CI by @li126com in #203
- fix(QA): add real tgs to train_CI by @li126com in #227
- Fix(mha,linear): fix norm_head and mha inference by @KimmiShi in #234
New Contributors
Full Changelog: v0.5.1dev20240517...v0.5.2dev20240525
InternEvo-v0.5.1dev20240517
What's Changed
- fix(model): fix model forward when checkpoint=true by @mwiacx in #219
- fix(pipeline_scheduler): fix recv_obj_meta args by @mwiacx in #218
- fix(pipeline_scheduler): fix interleaved load_micro_batch by @mwiacx in #217
- fix test_pipeline by @sallyjunjun in #210
- remove flash_attn related operator dependency by @sallyjunjun in #214
- fix(tgs all): set very_beginning_time by @li126com in #207
- fix(ci): fix command and branch by @kkscilife in #216
- Feat (doc): add torch_npu installing by @li126com in #213
Full Changelog: v0.5.0dev20240510...v0.5.1dev20240517
InternEvo-v0.5.0dev20240510
What's Changed
Full Changelog: v0.4.1dev20240510...v0.5.0dev20240510
InternEvo-v0.4.1dev20240510
What's Changed
- feat(moe): support topk gating (k>2) by @blankde in #171
- fix(ci): check job status by @kkscilife in #148
- fix get_accelerator error by @sallyjunjun in #179
- feat(internlm): remove use_cuda_flash_attn by @SolenoidWGT in #175
- fix(dipu): fix dipu import rotary by @SolenoidWGT in #183
- fix(model/utils.py): fix unpack data inference squeeze dim and cuda linear wgrad by @huangting4201 in #184
- fix(mlp): fix mlp ckpt save/load by @SolenoidWGT in #181
- Feat(logger): add real tgs computing and logging by @li126com in #174
- Fix(QA): fix some QA code for new version by @li126com in #189
- Fix(CI): fix little bug in yaml by @li126com in #190
- adapt for the newest deeplinkect by @SolenoidWGT in #186
- fix(logger): add filehandler by @JiaoPL in #180
- Fix(CI): fix little bug in yaml once more by @li126com in #191
- feat(multimodal): support train llava with dummy dataset by @Khoray in #91
- fix(logger.py): fix logger that print info twice by @huangting4201 in #192
- fix(logger): no log files by @JiaoPL in #193
- fix test model error by @sallyjunjun in #185
- fix(ci): rm parameter 'update_panel' by @JiaoPL in #194
- fix(solver): fix gpu fused adamw condition by @SolenoidWGT in #196
- fix(multimodal): handle the case when 'input_ids' is None by @JiaoPL in #197
- fix(train/utils.py): fix moe and fp32 param group split when model dtype is fp32 by @huangting4201 in #198
- fix(utils/common.py): assert PYTORCH_CUDA_ALLOC_CONF is None and fix loss test ckpt load failed by @huangting4201 in #201
- Fix(QA): fix monthly test by @li126com in #202
New Contributors
Full Changelog: v0.4.0dev20240403...v0.4.1dev20240510
InternEvo-v0.4.0dev20240403
What's Changed
- feat(internlm): refactor code structure based on InternTrain by @huangting4201 in #82
- fix(tokenized/packed_dataset.py): fix packed dataset when train_folder is not None by @huangting4201 in #88
- fix(transformers): fix no white space when chatting with fast tokenizer by @x54-729 in #90
- Fix(TrainState): fix trainstate batch sampler by @zigzagcai in #102
- feat: rm grad profiling by @JiaoPL in #100
- fix(train/pipeline.py): fix nan grad norm by @huangting4201 in #103
- Fix(QA): fix check ckpt loss by @li126com in #89
- improve zero grad communication overlap with pp by @mwiacx in #104
- feat(optimizer/hybrid_zero_optim.py): remove two stage compute norm by @huangting4201 in #106
- fix(embedding.py): fix triton apply_rotary to rotary_emb version by @sallyjunjun in #105
- feat(npu): add Ascend 910B support by @SolenoidWGT in #110
- feat(tokenized/dummy_dataset.py): support fixed seqlen for random dataset samples by @huangting4201 in #119
- Fix(QA): fix test optimizer and no_fa_output by @li126com in #124
- feat(initialize/launch.py): support switch use_packed_dataset by @huangting4201 in #117
- fix apply_rotary_torch not inplace problem by @sallyjunjun in #123
- fix(npu): refactor split_half_float_double and remove str key by @SolenoidWGT in #131
- feat(launch.py): update assert info for use_packed_dataset and fix backend accelerator get error by @huangting4201 in #125
- remove global variable internlm_accelerator by @sallyjunjun in #133
- fix(gpc): remove unused num_processes_on_current_node by @SolenoidWGT in #136
- Fix(support npu): some little bugs for npu support by @li126com in #129
- feat(model): extend dim bsz for packed data for standardizing the sp processing dimension by @huangting4201 in #141
- Fix(device name): use consist way for get device by @li126com in #139
- replace is_cuda with get_accelerator_backend by @sallyjunjun in #143
- Feat(npu): change current_time format to adapt npu profiler by @li126com in #147
- fix INTERNLM2_PUBLIC by @sallyjunjun in #150
- feat(eval): optimize evaluation context and remove DtypeTensor by @huangting4201 in #149
- fix(QA): fix test_forward_output_no_fa by @li126com in #151
- fix(QA): re-adapt some QA code for new version by @li126com in #146
- fix(unpack_data): pad -100 on labels by @sunpengsdu in #154
- feat(attn): support npu flash attention by @SolenoidWGT in #145
- fix(dummy_dataset): fixed_random_dataset_seqlen default is true by @sunpengsdu in #156
- fix(npu): fix attn mask move device by @SolenoidWGT in #159
- fix: little bug by @JiaoPL in #160
- refactor(moe): expose more interfaces for moe by @blankde in #157
- set dummy data fix length false in ci by @sunpengsdu in #163
- feat(mlp): support mlp layer fusion by @SolenoidWGT in #161
- feat(deeplink): add deeplink as new backend by @caikun-pjlab in #168
- fix(optimizer): skip param with requires grad is False by @huangting4201 in #169
- fix internlm_accelerator by @sallyjunjun in #166
- remove timer_diagnosis and bench_gpu by @sallyjunjun in #170
- feat(model): support npu with packed data by @huangting4201 in #167
- fix(modules/multi_head_attention.py): fix distributed attn argument err in npu by @huangting4201 in #172
- fix(utils/logger.py): remove uniscale logger in public repo by @huangting4201 in #118
- fix(activation_checkpoint.py): fix rng mode in activation ckpt by @huangting4201 in #177
New Contributors
- @JiaoPL made their first contribution in #100
- @SolenoidWGT made their first contribution in #110
- @caikun-pjlab made their first contribution in #168
Full Changelog: v0.3.3dev20240315...v0.4.0dev20240403
InternEvo-v0.3.3dev20240315
What's Changed
- remove dependency of flash_attn when use_flash_attn is set to false by @sallyjunjun in #20
- fix(transformers): fix parameter error of
safe_open
in revert scripts by @x54-729 in #74 - Update version.txt by @sunpengsdu in #81
- fix(embedding.py): fix flash attn error of llama and internlm2 by @sallyjunjun in #83
- fix(ckpt): fix load funcs when loading llama & hf_llama by @gaoyang07 in #79
- Fix missing requirments for NUMA by @Godricly in #80
- test(workflow): add workflow for norm_weight_test by @kkscilife in #70
- feat(moe): impl moe with megablock kernel by @blankde in #76
New Contributors
Full Changelog: v0.3.2dev20240313...v0.3.3dev20240315
InternEvo-v0.3.2dev20240313
What's Changed
- Delete .github/workflows/stale.yml by @del-zhenwu in #66
- Fix (unitest, interleaved pp and other bugs): re-adapt unitest for isp and adapt interleaved pp for no flash_attention by @li126com in #52
- feat(model/linear.py): support norm head for model internlm2 by @huangting4201 in #68
- feat(modeling_internlm2.py): update model type to INTERNLM2_PUBLIC by @huangting4201 in #69
- feat(ckpt): optimize model checkpointing in Volc and Ali by @zigzagcai in #65
- fix(communication/isp.py): fix redundant callback and remove head embed hook by @huangting4201 in #72
- (feat/demo) add internlm2 1.8b config by @00INDEX in #73
- Feat(QA): temp no fa by @li126com in #75
New Contributors
- @del-zhenwu made their first contribution in #66
- @zigzagcai made their first contribution in #65
- @00INDEX made their first contribution in #73
Full Changelog: v0.3.1dev20240229...v0.3.2dev20240313
InternEvo-v0.3.1dev20240229
What's Changed
- Feat(QA): check output for no fa by @li126com in #42
- feat(model): update modeling_internlm2 with configs by @gaoyang07 in #15
- feat(tests): update ci e2e tests by @huangting4201 in #45
- fix(moe): fix bugs for moe sequence parallel and memory pool by @blankde in #50
- fix(optimizer/hybrid_zero_optim.py): fix layer norm grad allreduce when sp is True by @huangting4201 in #53
- test(workflow): change env into flash2 and add rerun workflow by @kkscilife in #48
- feat(code-docs): update doc tensor parallel by @huangting4201 in #43
- feat(parallel_context.py): add gqa process group to allreduce dkv by @huangting4201 in #54
- fix(context/process_group_initializer.py): fix gqa process group by @huangting4201 in #58
- feat(*): remove unnecessary communication by @mwiacx in #60
- Fix(param overlap): fix overlap of broadcasting and computing by @li126com in #46
- test(ci): add write permissions for actions by @kkscilife in #56
- feat(model): update model internlm2 by @huangting4201 in #47
- Feat(QA norm):check norm weights for different ranks by @li126com in #62
- feat(switch topology): add control switch by @li126com in #55
- Fix/fix broadcast overlap with isp by @mwiacx in #64
- fix(QA): fix test_swap_nb_loss_and_gradnorm by @li126com in #63
New Contributors
Full Changelog: v0.3.0dev20240201...v0.3.1dev20240229
InternEvo-v0.2.4-internlm2
Full Changelog: v0.2.3dev20240201...v0.2.4-internlm2
InternEvo-v0.3.0dev20240201
What's Changed
- feat/refactor partition strategy by @huangting4201 in #13
Full Changelog: v0.2.3dev20240201...v0.3.0dev20240201