-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metal.randn! produces Nan #474
Comments
#321 is part of 1.4.2, so cc @christiangnrd. |
I'll look into some more when I have more time, but this seems to be an upstream bug. See pytorch/pytorch#89283 |
Happens in Swift too. You even get the same NaNs when using the same seed. Julia example: julia> using Metal;N=100000000;X0=Metal.zeros(Float32, N);rng = MPS.RNG(device(), 1234); Metal.randn!(rng, X0);collect(1:N)[Array(isnan.(X0))]
9-element Vector{Int64}:
5281172
18887991
23601183
27449378
28126551
43658369
53930280
71339505
76705941 Swift equivalent MWEimport Metal
import MetalPerformanceShaders
func main(T: Float.Type = Float32.self, N: Int = 100000000, seed: Int = 1234) {
guard let device = MTLCreateSystemDefaultDevice(),
let commandQueue = device.makeCommandQueue() else {
fatalError("Metal device or command queue could not be created")
}
var a = [Float](repeating: 1, count: N)
let aBuffer = device.makeBuffer(bytes: &a, length: MemoryLayout<Float>.size * N, options: [])
let aVectorDescriptor = MPSVectorDescriptor(length: N, dataType: .float32)
let aVector = MPSVector(buffer: aBuffer!, descriptor: aVectorDescriptor)
let randesc = MPSMatrixRandomDistributionDescriptor.normalDistributionDescriptor(withMean: 0, standardDeviation: 1)
var rand = MPSMatrixRandomPhilox(device: device, destinationDataType: .float32, seed: seed, distributionDescriptor: randesc)
let commandBuffer = commandQueue.makeCommandBuffer()!
rand.encode(commandBuffer: commandBuffer, destinationVector: aVector)
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
// Check for NaNs in the result matrix
let aPointer = aBuffer!.contents().bindMemory(to: Float.self, capacity: N)
var j = 0
var c = 0
print(String(format: "Found NaNs:", c))
while j < N {
if aPointer[j].isNaN {
c += 1
print(j)
}
j += 1
}
print(String(format: "\n%d NaN in result\n", c))
}
main() Swift Output:
|
Did you post the Swift MWE on the Apple forums before to see what they have to say? I suppose an autorelease pool will fix this as always. |
https://developer.apple.com/forums/thread/767452
I don't think so this time. Using a uniform generator has no issues. That and the consistent locations of the NaNs in the result lead me to believe it's a bug in their shaders |
For what is worth: In [4]: import mlx.core as mx
...: n = 100000000
...: for i in range(10):
...: a = mx.random.normal((1000000000,), stream = mx.gpu)
...: print(mx.sum(mx.isnan(a)))
...:
array(0, dtype=int32)
array(0, dtype=int32)
array(0, dtype=int32)
array(0, dtype=int32)
array(0, dtype=int32)
array(0, dtype=int32)
array(0, dtype=int32)
array(0, dtype=int32)
array(0, dtype=int32)
array(0, dtype=int32) |
I received a reply from an Apple Engineer about this today on my forum post. They just asked me to report it via feedback assistant but it's a sign they may look into it. |
Hi,
On a M2 Max with
I observe:
I hope this is not too difficult to fix.
best regards
The text was updated successfully, but these errors were encountered: