Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phi-related IR downgrade issue #488

Closed
efaulhaber opened this issue Dec 2, 2024 · 5 comments · Fixed by #489
Closed

phi-related IR downgrade issue #488

efaulhaber opened this issue Dec 2, 2024 · 5 comments · Fixed by #489
Assignees
Labels
kernels Things about kernels and how they are compiled.

Comments

@efaulhaber
Copy link

In my kernel, I get the following error.

ERROR: Compilation to native code failed; see below for details.
If you think this is a bug, please file an issue and attach the following files:
- /var/folders/gx/ckq0xt295kxgf7ygv1xxprhr0000gn/T/jl_tWts1wHcfG.ll
- /var/folders/gx/ckq0xt295kxgf7ygv1xxprhr0000gn/T/jl_9KVmf6XORU.air
- /var/folders/gx/ckq0xt295kxgf7ygv1xxprhr0000gn/T/jl_b2jmZmHy3w.metallib
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] macro expansion
    @ ~/.julia/packages/Metal/6SEqe/src/compiler/compilation.jl:206 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/ObjectiveC/C7BVt/src/os.jl:264 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/Metal/6SEqe/src/compiler/compilation.jl:183 [inlined]
  [5] (::Metal.var"#172#173"{Bool, GPUCompiler.CompilerJob{…}, @NamedTuple{…}})()
    @ Metal ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:637
  [6] macro expansion
    @ ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:565 [inlined]
  [7] macro expansion
    @ ./lock.jl:273 [inlined]
  [8] ObjectiveC.Foundation.NSAutoreleasePool(f::Metal.var"#172#173"{Bool, GPUCompiler.CompilerJob{…}, @NamedTuple{…}})
    @ ObjectiveC.Foundation ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:557
  [9] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{…}; return_function::Bool)
    @ Metal ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:636
 [10] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/8Yisz/src/execution.jl:262
 [11] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/8Yisz/src/execution.jl:151
 [12] macro expansion
    @ ~/.julia/packages/Metal/6SEqe/src/compiler/execution.jl:189 [inlined]
 [13] macro expansion
    @ ./lock.jl:273 [inlined]
 [14] mtlfunction(f::typeof(PointNeighbors.gpu_foreach_neighbor_double_buffer), tt::Type{…}; name::Nothing, kwargs::@Kwargs{})
    @ Metal ~/.julia/packages/Metal/6SEqe/src/compiler/execution.jl:184
 [15] mtlfunction(f::typeof(PointNeighbors.gpu_foreach_neighbor_double_buffer), tt::Type{Tuple{…}})
    @ Metal ~/.julia/packages/Metal/6SEqe/src/compiler/execution.jl:182
 [16] macro expansion
    @ ~/.julia/packages/Metal/6SEqe/src/compiler/execution.jl:85 [inlined]
 [17] (::KernelAbstractions.Kernel{…})(::Function, ::Vararg{…}; ndrange::Int64, workgroupsize::Nothing)
    @ Metal.MetalKernels ~/.julia/packages/Metal/6SEqe/src/MetalKernels.jl:110
 [18] Kernel
    @ ~/.julia/packages/Metal/6SEqe/src/MetalKernels.jl:106 [inlined]
 [19] #foreach_point_neighbor_localmem#115
    @ ~/git/PointNeighbors.jl/src/nhs_grid.jl:416 [inlined]
 [20] foreach_point_neighbor_localmem
    @ ~/git/PointNeighbors.jl/src/nhs_grid.jl:403 [inlined]
 [21] foreach_point_neighbor(f::Function, system::WeaklyCompressibleSPHSystem{…}, neighbor_system::WeaklyCompressibleSPHSystem{…}, system_coords::MtlMatrix{…}, neighbor_coords::MtlMatrix{…}, neighborhood_search::GridNeighborhoodSearch{…}; points::Base.OneTo{…}, parallel::Bool)
    @ TrixiParticles ~/git/TrixiParticles.jl/src/general/neighborhood_search.jl:20
 [22] foreach_point_neighbor
    @ ~/git/TrixiParticles.jl/src/general/neighborhood_search.jl:13 [inlined]
 [23] interact!
    @ ~/git/TrixiParticles.jl/src/schemes/fluid/weakly_compressible_sph/rhs.jl:23 [inlined]
 [24] var"##core#950"(dv#923::MtlMatrix{…}, v#924::MtlMatrix{…}, u#925::MtlMatrix{…}, v#926::MtlMatrix{…}, u#927::MtlMatrix{…}, nhs#928::GridNeighborhoodSearch{…}, system#929::WeaklyCompressibleSPHSystem{…}, system#930::WeaklyCompressibleSPHSystem{…})
    @ Main ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:561
 [25] var"##sample#951"(::Tuple{…}, __params::BenchmarkTools.Parameters)
    @ Main ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:570
 [26] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; maxevals::Int64, kwargs::@Kwargs{})
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:187
 [27] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:182
 [28] #invokelatest#2
    @ ./essentials.jl:1055 [inlined]
 [29] invokelatest
    @ ./essentials.jl:1052 [inlined]
 [30] #lineartrial#46
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:51 [inlined]
 [31] lineartrial
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:50 [inlined]
 [32] tune!(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, verbose::Bool, pad::String, kwargs::@Kwargs{})
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:300
 [33] tune! (repeats 2 times)
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:289 [inlined]
 [34] macro expansion
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:447 [inlined]
 [35] benchmark_wcsph_fp32(neighborhood_search::GridNeighborhoodSearch{…}, coordinates_::Matrix{…}; parallel::MetalBackend)
    @ Main ~/git/PointNeighbors.jl/benchmarks/smoothed_particle_hydrodynamics.jl:110
 [36] plot_benchmarks(benchmark::typeof(benchmark_wcsph_fp32), n_points_per_dimension::Tuple{…}, iterations::Int64; parallel::MetalBackend, title::String, seed::Int64, perturbation_factor_position::Float64)
    @ Main ~/git/PointNeighbors.jl/benchmarks/plot.jl:133
 [37] top-level scope
    @ REPL[3]:1

caused by: NSError: Failed to materializeAll. (AGXMetalG14X, code 3)
Stacktrace:
  [1] Metal.MTL.MTLComputePipelineState(dev::Metal.MTL.MTLDeviceInstance, fun::Metal.MTL.MTLFunctionInstance)
    @ Metal.MTL ~/.julia/packages/Metal/6SEqe/lib/mtl/compute_pipeline.jl:60
  [2] macro expansion
    @ ~/.julia/packages/Metal/6SEqe/src/compiler/compilation.jl:188 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/ObjectiveC/C7BVt/src/os.jl:264 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/Metal/6SEqe/src/compiler/compilation.jl:183 [inlined]
  [5] (::Metal.var"#172#173"{Bool, GPUCompiler.CompilerJob{…}, @NamedTuple{…}})()
    @ Metal ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:637
  [6] macro expansion
    @ ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:565 [inlined]
  [7] macro expansion
    @ ./lock.jl:273 [inlined]
  [8] ObjectiveC.Foundation.NSAutoreleasePool(f::Metal.var"#172#173"{Bool, GPUCompiler.CompilerJob{…}, @NamedTuple{…}})
    @ ObjectiveC.Foundation ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:557
  [9] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{…}; return_function::Bool)
    @ Metal ~/.julia/packages/ObjectiveC/C7BVt/src/foundation.jl:636
 [10] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/8Yisz/src/execution.jl:262
 [11] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/8Yisz/src/execution.jl:151
 [12] macro expansion
    @ ~/.julia/packages/Metal/6SEqe/src/compiler/execution.jl:189 [inlined]
 [13] macro expansion
    @ ./lock.jl:273 [inlined]
 [14] mtlfunction(f::typeof(PointNeighbors.gpu_foreach_neighbor_double_buffer), tt::Type{…}; name::Nothing, kwargs::@Kwargs{})
    @ Metal ~/.julia/packages/Metal/6SEqe/src/compiler/execution.jl:184
 [15] mtlfunction(f::typeof(PointNeighbors.gpu_foreach_neighbor_double_buffer), tt::Type{Tuple{…}})
    @ Metal ~/.julia/packages/Metal/6SEqe/src/compiler/execution.jl:182
 [16] macro expansion
    @ ~/.julia/packages/Metal/6SEqe/src/compiler/execution.jl:85 [inlined]
 [17] (::KernelAbstractions.Kernel{…})(::Function, ::Vararg{…}; ndrange::Int64, workgroupsize::Nothing)
    @ Metal.MetalKernels ~/.julia/packages/Metal/6SEqe/src/MetalKernels.jl:110
 [18] Kernel
    @ ~/.julia/packages/Metal/6SEqe/src/MetalKernels.jl:106 [inlined]
 [19] #foreach_point_neighbor_localmem#115
    @ ~/git/PointNeighbors.jl/src/nhs_grid.jl:416 [inlined]
 [20] foreach_point_neighbor_localmem
    @ ~/git/PointNeighbors.jl/src/nhs_grid.jl:403 [inlined]
 [21] foreach_point_neighbor(f::Function, system::WeaklyCompressibleSPHSystem{…}, neighbor_system::WeaklyCompressibleSPHSystem{…}, system_coords::MtlMatrix{…}, neighbor_coords::MtlMatrix{…}, neighborhood_search::GridNeighborhoodSearch{…}; points::Base.OneTo{…}, parallel::Bool)
    @ TrixiParticles ~/git/TrixiParticles.jl/src/general/neighborhood_search.jl:20
 [22] foreach_point_neighbor
    @ ~/git/TrixiParticles.jl/src/general/neighborhood_search.jl:13 [inlined]
 [23] interact!
    @ ~/git/TrixiParticles.jl/src/schemes/fluid/weakly_compressible_sph/rhs.jl:23 [inlined]
 [24] var"##core#950"(dv#923::MtlMatrix{…}, v#924::MtlMatrix{…}, u#925::MtlMatrix{…}, v#926::MtlMatrix{…}, u#927::MtlMatrix{…}, nhs#928::GridNeighborhoodSearch{…}, system#929::WeaklyCompressibleSPHSystem{…}, system#930::WeaklyCompressibleSPHSystem{…})
    @ Main ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:561
 [25] var"##sample#951"(::Tuple{…}, __params::BenchmarkTools.Parameters)
    @ Main ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:570
 [26] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; maxevals::Int64, kwargs::@Kwargs{})
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:187
 [27] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:182
 [28] #invokelatest#2
    @ ./essentials.jl:1055 [inlined]
 [29] invokelatest
    @ ./essentials.jl:1052 [inlined]
 [30] #lineartrial#46
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:51 [inlined]
 [31] lineartrial
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:50 [inlined]
 [32] tune!(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, verbose::Bool, pad::String, kwargs::@Kwargs{})
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:300
 [33] tune! (repeats 2 times)
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:289 [inlined]
 [34] macro expansion
    @ ~/.julia/packages/BenchmarkTools/QNsku/src/execution.jl:447 [inlined]
 [35] benchmark_wcsph_fp32(neighborhood_search::GridNeighborhoodSearch{…}, coordinates_::Matrix{…}; parallel::MetalBackend)
    @ Main ~/git/PointNeighbors.jl/benchmarks/smoothed_particle_hydrodynamics.jl:110
 [36] plot_benchmarks(benchmark::typeof(benchmark_wcsph_fp32), n_points_per_dimension::Tuple{…}, iterations::Int64; parallel::MetalBackend, title::String, seed::Int64, perturbation_factor_position::Float64)
    @ Main ~/git/PointNeighbors.jl/benchmarks/plot.jl:133
 [37] top-level scope
    @ REPL[3]:1
Some type information was truncated. Use `show(err)` to see complete types.

Archive 2.zip

The code for the kernel (using KernelAbstractions.jl) is here:
https://github.com/trixi-framework/PointNeighbors.jl/pull/73/files#diff-d920170c38b8042080898ea9c427ca54e7f718b067a5a9bca914c89dedbbdeba

This is the kernel foreach_neighbor_double_buffer. The similar kernel foreach_neighbor_localmem is compiling just fine (although running very slowly).
Both run on Nvidia and AMD GPUs.

@maleadt
Copy link
Member

maleadt commented Dec 2, 2024

Which versions of Metal.jl? Please share Metal.versioninfo() as well as a Manifest.

@maleadt
Copy link
Member

maleadt commented Dec 2, 2024

In any case, this does look to hit a bug in the LLVM IR downgrader:

@threadgroup_memory = external global [256 x i8]

define void @kernel() {
entry:
  br label %exit

loop_entry:
  %0 = phi i32* [ bitcast ([256 x i8]* @threadgroup_memory to i32*), %loop_cont1 ], [ null, %loop_cont2 ]
  br label %exit

loop_cont1:
  br i1 false, label %exit, label %loop_entry

loop_cont2:
  br label %loop_entry

exit:
  ret void
}
❯ ./metallib-as reduced.ll -o - | ./metallib-dis -S - -o /dev/null
ERROR: LoadError: LLVM error: Invalid phi record (Producer: 'LLVM16.0.6' Reader: 'LLVM 16.0.6jl')

I'll have a look later this week.

@maleadt maleadt self-assigned this Dec 2, 2024
@maleadt maleadt added the kernels Things about kernels and how they are compiled. label Dec 2, 2024
@maleadt maleadt changed the title NSError: Failed to materializeAll. (AGXMetalG14X, code 3) phi-related IR downgrade issue Dec 2, 2024
@efaulhaber
Copy link
Author

efaulhaber commented Dec 2, 2024

julia> Metal.versioninfo()
macOS 14.5.0, Darwin 23.5.0

Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6

Julia packages: 
- Metal.jl: 1.4.0
- GPUArrays: 11.1.0
- GPUCompiler: 1.0.1
- KernelAbstractions: 0.9.29
- ObjectiveC: 3.1.0
- LLVM: 9.1.3
- LLVMDowngrader_jll: 0.4.0+0

1 device:
- Apple M2 Pro (64.000 KiB allocated)

Manifest.toml.zip

Edit: This is using the latest main of Metal.jl and master of GPUCompiler.jl because of #480.

@maleadt
Copy link
Member

maleadt commented Dec 4, 2024

#489 should fix this; can you verify?

@efaulhaber
Copy link
Author

Yes, working now. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernels Things about kernels and how they are compiled.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants