Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] Multi-GPU for CUDA Version #6138

Open
wants to merge 65 commits into
base: master
Choose a base branch
from
Open

[CUDA] Multi-GPU for CUDA Version #6138

wants to merge 65 commits into from

Conversation

shiyu1994
Copy link
Collaborator

This is to integrate multi-GPU support for CUDA version, with NCCL.

@shiyu1994 shiyu1994 self-assigned this Oct 10, 2023
@shiyu1994 shiyu1994 changed the title [CUDA] Multi-GPU for CUDA Version [WIP] [CUDA] Multi-GPU for CUDA Version Oct 10, 2023
@shiyu1994 shiyu1994 changed the title [WIP] [CUDA] Multi-GPU for CUDA Version [CUDA] Multi-GPU for CUDA Version Dec 15, 2023
@shiyu1994 shiyu1994 closed this Dec 15, 2023
@shiyu1994 shiyu1994 reopened this Dec 15, 2023
@shiyu1994
Copy link
Collaborator Author

@guolinke This is almost ready. You may review this when you have time. Thanks.

@guolinke
Copy link
Collaborator

are there any tests for the multi-GPU training?


- ``num_gpu`` :raw-html:`<a id="num_gpu" title="Permalink to this parameter" href="#num_gpu">&#x1F517;&#xFE0E;</a>`, default = ``1``, type = int, constraints: ``num_gpu > 0``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with changing the main parameter name to num_gpus, but can we please keep num_gpu as a parameter alias? So that existing code using that parameter isn't broken?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants