Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define cpu<->gpu conversions #2114

Merged
merged 1 commit into from
Jan 10, 2025
Merged

Define cpu<->gpu conversions #2114

merged 1 commit into from
Jan 10, 2025

Conversation

charleskawczynski
Copy link
Member

I ran into this again today, and I think it's about time we fix support for this.

@charleskawczynski charleskawczynski force-pushed the ck/adapt_cpu_gpu branch 2 times, most recently from a7e72fb to b5816bc Compare December 25, 2024 01:40
@charleskawczynski
Copy link
Member Author

I spoke with @juliasloan25 earlier today, and she and I both ran into a few issues, that I'll summarize here:

Partially implemented mirrored gpu objects

We have mirror objects (e.g., Grids.DeviceExtrudedFiniteDifferenceGrid) for the gpu because (AFAICT) their counterparts (i.e., Grids.ExtrudedFiniteDifferenceGrid) are mutable, which cannot be put on the gpu. Furthermore, whether it was a design choice or accidental, DeviceSpectralElementGrid2D only has some of the objects that SpectralElementGrid2D has, so we cannot adapt to CuArray and adapt back to Array, since we throw out some information when adapting to DeviceSpectralElementGrid2D. We could use a different mechanism (i.e., define "device_adapt" or something), however, cuda kernels may not work in that case. We could try to synchronize that data structures more, but that could lead to meta data issues (if that was a choice to remove some things). Another option would be to add small bits into our data structures that would allow us to recreate them. For example, IntervalMesh discards stretch once faces is established. Not discarding stretch could allow us to lazily represent the interval mesh (discussed in #1790), and reduce the amount of data needed to export / import for restarts (plus we could recreate everything we want here on the fly).

I'm generally in favor of preserving small constructor meta because of these nice properties, and I think it could offer us some alternative solutions. But I'm not sure it's worth the effort for this particular task, at this moment.

I think the value of having convenient cpu<->gpu conversions is worth exploring the route of synchronizing the structs so that we can convert everything back and forth.

ClimaComms device ambiguity

If we start with a ClimaComms.CUDADevice, and we want to adapt to the cpu, then it's unclear whether that conversion should be to a CPUSingleThreaded or CPUMultiThreaded, because there's simply no history to determine which the user wants. We could write wrappers around our adapt functions to do this, but that will require threading the device information all the way through to wherever it's needed. I'm inclined to say that, for this particular situation, using CPUSingleThreaded as a default may be worth the pain in lack of flexibility.

@charleskawczynski charleskawczynski force-pushed the ck/adapt_cpu_gpu branch 2 times, most recently from f4513b9 to 4ec39fa Compare January 9, 2025 17:52
@charleskawczynski charleskawczynski marked this pull request as ready for review January 9, 2025 17:52
@charleskawczynski
Copy link
Member Author

I also spoke with @Sbozzolo recently about fixing this, and I think the conclusion is that we'll keep the on-device objects (or perhaps better named in-kernel objects) as slimmed down versions of the cpu versions, but we can allow adapting the cpu versions to on-device (gpu), with all of the same information.

This way we can still keep some distance from the meta data issues, while still being able to switch between cpu and gpu.

The way this works is:

  • Adapt.adapt(Array, x) returns a new object of x on the cpu
  • Adapt.adapt(CUDA.CuArray, x) returns a new object of x on the gpu
  • Adapt.adapt(CUDA.KernelAdaptor, x) returns a new object of x that is in-kernel compatible

Closes #2091.

@charleskawczynski
Copy link
Member Author

Closes #1296.

@charleskawczynski charleskawczynski force-pushed the ck/adapt_cpu_gpu branch 2 times, most recently from 4ec39fa to dcbec76 Compare January 9, 2025 18:26
@Sbozzolo
Copy link
Member

Sbozzolo commented Jan 9, 2025

I think it would still be useful to provide a todevice function (as suggested in #1296), at least defined on Fields and FieldVectors, that internally calls the correct Adapt.

Maybe something like this?

"""
     todevice(device, field)

Move `Field` field to the given `device`.

This is particularly useful to move `Field`s from CPUs to GPUs and viceversa.

If the `field` is already defined on the target device, return a copy.
"""
function todevice(device::ClimaComms.AbstractDevice, field)
    return Adapt.adapt(ClimaComms.array_type(device), field)
end

todevice(::ClimaComms.CPUMultiThreaded, _) = error("Not supported")

(I don't know if adapting returns a copy)

This is so that downstream packages don't have to explicitely depend on Adapt and use it. I think Adapt comes with a learning curve and is not the most intuitive of the packages. The documentation does not make it clear how to use it and is not very beginner friendly. In addition to this, ClimaComms defines a notion of devices, and it would be more coherent to stick to them, instead of requiring one to understand Array vs CuArray.

In any case, all of this (both the user facing aspects and the implementation details) should be documented in the documentation.

@charleskawczynski
Copy link
Member Author

I think it would still be useful to provide a todevice function (as suggested in #1296), at least defined on Fields and FieldVectors, that internally calls the correct Adapt.

Maybe something like this?

"""
     todevice(device, field)

Move `Field` field to the given `device`.

This is particularly useful to move `Field`s from CPUs to GPUs and viceversa.

If the `field` is already defined on the target device, return a copy.
"""
function todevice(device::ClimaComms.AbstractDevice, field)
    return Adapt.adapt(ClimaComms.array_type(device), field)
end

todevice(::ClimaComms.CPUMultiThreaded, _) = error("Not supported")

Sure, I think we can do this since it's a separate name (we shouldn't have any ambiguities). Speaking of which, I need to fix the ambiguity in the tests.

This is so that downstream packages don't have to explicitly depend on Adapt and use it. I think Adapt comes with a learning curve and is not the most intuitive of the packages. The documentation does not make it clear how to use it and is not very beginner friendly. In addition to this, ClimaComms defines a notion of devices, and it would be more coherent to stick to them, instead of requiring one to understand Array vs CuArray.

In any case, all of this (both the user facing aspects and the implementation details) should be documented in the documentation.

I don't think that we can/should remove adapt as an extension because it's generic and lightweight. But, I agree that introducing todevice is easier to document / explain. So I think that's another good reason to add it.

@charleskawczynski
Copy link
Member Author

One downside of defining a new method, todevice ourselves, is that we need to define where it lives. Should it live in ClimaCore.Fields? in ClimaCore?

@charleskawczynski
Copy link
Member Author

It feels like todevice can be added after the fact, since it's just a wrapper method. Let me see if there's a way to document the adapt methods.

@charleskawczynski
Copy link
Member Author

Ah, Aqua is (rightfully so) complaining about piracy. Let's move those definitions to ClimaComms.

@charleskawczynski
Copy link
Member Author

I need to strip out the ClimaComms pieces. This PR will depend on CliMA/ClimaComms.jl#103.

Try to fix downgrade ci
@charleskawczynski
Copy link
Member Author

The main functionality is fully implemented without the wrapper, so let's follow up with a wrapper in a subsequent PR.

@Sbozzolo
Copy link
Member

Sbozzolo commented Jan 10, 2025

Can you make sure to add documentation in the next PR? You added a new feature and you clarified the role of certain structs, but this GitHub discussion is the only place where one can find information about them

@charleskawczynski
Copy link
Member Author

Can you make sure to add documentation in the next PR? You added a new feature and you clarified the role of certain structs, but this GitHub discussion is the only place where one can find information about them

Yes, there are a handful of doc pieces I’d like to update, and I thought that it’d be easier to group them together

@charleskawczynski charleskawczynski merged commit a83ceb7 into main Jan 10, 2025
32 of 34 checks passed
@charleskawczynski charleskawczynski deleted the ck/adapt_cpu_gpu branch January 10, 2025 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants