Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Adapt calls, and remove device-side structs #2118

Closed
wants to merge 2 commits into from

Conversation

charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented Jan 6, 2025

In efforts to support conversion between cpu and gpu, we need the device-side objects to retain as much as possible so that we can convert back and forth without losing information.

This PR does a few things:

  • Abstracts away getproperty calls to mpi context for mpicomm
  • Fixes some adapt calls-- we only want adapt_structure to return device structs when called with a CUDA.KernelAdaptor.
  • Remove the device side structs, and instead use the ClimaComms objects.

This should hopefully work for all cases except for CUDA-MPI jobs, since adapting from CPU to GPU with mpicomm (a mutable object) cannot reside on the device. This limitation should hopefully be fixed by CliMA/ClimaComms.jl#102.

Also, I noticed that the main branch has some typing issues. In particular:

## aliases
const RectilinearSpectralElementGrid2D =
    SpectralElementGrid2D{<:Topologies.RectilinearTopology2D}

does not apply to device-side objects, since those would need to be

const RectilinearSpectralElementGrid2D =
    DeviceSpectralElementGrid2D{<:Topologies.RectilinearTopology2D}

(assuming that their type parameters are synchronized)

@sriharshakandala
Copy link
Member

sriharshakandala commented Jan 6, 2025

Would we be able to

  1. Move all Adapt code out of src/ to ext/?
  2. Avoid adding Adapt explicitly to the source project and extensions as a dependency, instead access it through CUDA.Adapt or Metal.Adapt inside the extensions?
    For example, something similar to https://github.com/CliMA/ClimaInterpolations.jl/pull/3/files#diff-3702a35dcf2bad10ba7fc72fc0c50366627cd0679c4e3afda9d24c602ad82a8eR117 &

@charleskawczynski
Copy link
Member Author

Would we be able to

  1. Move all Adapt code out of src/ to ext/?
  2. Avoid adding Adapt explicitly to the source project and extensions as a dependency, instead access it through CUDA.Adapt or Metal.Adapt inside the extensions?
    For example, something similar to https://github.com/CliMA/ClimaInterpolations.jl/pull/3/files#diff-3702a35dcf2bad10ba7fc72fc0c50366627cd0679c4e3afda9d24c602ad82a8eR117 &

I think we should be able to move adapt code into ext/, if not all of them, but there may be some cases where we need cpu adapt code.

In particular, there may be a difference between cuda adapting for the purposes of prepping data for kernels, and us adapting data between the cpu and gpu.

@charleskawczynski
Copy link
Member Author

Actually, we can overload getproperty on the ClimaComms context, and make it backwards compatible. So we don't even need to change the .mpicomm calls here. Going to revert that.

@charleskawczynski
Copy link
Member Author

Closing in favor of #2120 and #2114.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants