-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Issues: NVIDIA/Megatron-LM
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Where can I download the tokenizer for the model mcore-llava-mistral-7b-instruct-clip336-pretraining?
#1281
opened Nov 11, 2024 by
herolxl
[QUESTION] is there any restriction to use allgather with moe_expert_capacity_factor?
#1277
opened Nov 7, 2024 by
Louis-J
[BUG] TP-comm-overlap bug when replacing
TELayerNormColumnParallelLinear
into TEColumnParallelLinear
.
#1275
opened Nov 6, 2024 by
wplf
[BUG] The
cached_loss_mask
maybe modified unexpectedly in GPTDataset?
#1269
opened Nov 1, 2024 by
shmily326
[QUESTION] How to use loader_mcore and why it requires torch distributed
#1266
opened Oct 29, 2024 by
KookHoiKim
[ENHANCEMENT] Enabling LR scaling for a specific layer (ex. down-projection...) during pretraining
#1263
opened Oct 28, 2024 by
dhia680
[ENHANCEMENT] Add layer name in a layer to improve code debugging
#1198
opened Oct 4, 2024 by
rybakov
[BUG] "ValueError: optimizer got an empty parameter list" under pipeline parallel
#1166
opened Oct 2, 2024 by
takuya576
Why are not all SMs active when NCCL kernel and compute kernel overlap?[QUESTION]
#1161
opened Sep 27, 2024 by
yu-depend
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.