-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve performance #38
base: master
Are you sure you want to change the base?
Conversation
This appears to give ~10% on my benchmarks, but it would be way higher for more numerical data I guess. Added it to my test branch, see #34 |
This doesn't seem safe in general? What if |
I tried this on VGG19 from Metalhead and got julia> @time Metalhead.weights("vgg19.bson")
0.662478 seconds (1.65 M allocations: 631.681 MiB, 15.95% gc time)
Dict{Symbol,Any} with 38 entries:
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x1f353f3d -- getindex at .\array.jl:730 [inlined]
getindex at .\subarray.jl:228 [inlined]
_show_nonempty at .\arrayshow.jl:386
in expression starting at no file:0
getindex at .\array.jl:730 [inlined]
getindex at .\subarray.jl:228 [inlined]
_show_nonempty at .\arrayshow.jl:386
#383 at .\arrayshow.jl:404
jl_apply_generic at /home/Administrator/buildbot/worker/package_win64/build/src\gf.c:2219
show_nd at .\arrayshow.jl:294
_show_nonempty at .\arrayshow.jl:403 [inlined]
show at .\arrayshow.jl:421 |
Guess it's unsafe indeed :P |
Yeah, probably. #46 is at least a lot better than what we have now. |
Yup, I was thinking to just submit something like #46, but then I got greedy :D |
I'd say just merge #46 and leave this unmerged until we find something safer ;) |
Interestingly, just defining function reinterpret_(::Type{T}, x) where T
reinterpret(T, x)
end I can load VGG19 pretty damn fast julia> @time Metalhead.weights("vgg19.bson")
0.269944 seconds (1.97 k allocations: 548.196 MiB, 37.98% gc time)
Dict{Symbol,Any} with 38 entries: but it crashes when testing BSON itself. |
@KristofferC probably because reinterpret returns |
function reinterpret_(::Type{T}, x) where T | ||
len = Int(length(x) * (sizeof(eltype(x)) / sizeof(T))) | ||
GC.@preserve x begin | ||
return unsafe_wrap(Array, Ptr{T}(pointer(x)), len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may need to set own = true
for unsafe_wrap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that doesn't work on its own, but if I copy the memory then it works
function reinterpret_(::Type{T}, x::Vector{S}) where {T, S}
length(x) < 1 && return T[]
len = Int(length(x) * (sizeof(eltype(x)) / sizeof(T)))
GC.@preserve x begin
p = Ptr{UInt8}(pointer(x))
t = Ptr{UInt8}(Base.Libc.calloc(len, sizeof(T)))
ccall(:memset, Cvoid, (Ptr{UInt8}, Cint, Csize_t), t, 0, len * sizeof(T))
ccall(:memcpy, Cvoid, (Ptr{UInt8}, Ptr{UInt8}, Csize_t), t, p, sizeof(x))
Base.unsafe_wrap(Array, Ptr{T}(t), len; own=true)
end
end
This brings the time down on my fork to 0.479740 seconds from 0.51184. It would be better to copy the bytes directly from the input stream and only memset the trailing bytes.
I also tried creating a second reference to the original memory and wrapping it, but unsafe_wrap
just treats the Ref as a pointer AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I add mmap and copy directly from the IOBuffers byte vector, then we get the following:
julia> @timev BSONqs.load("../../../julia/serbench/vgg19.bson"; mmap=true);
0.250118 seconds (15.08 k allocations: 548.829 MiB, 19.26% gc time)
After benchmarking BSON to figure out if it's a good replacement for JSON, I noticed that its by far the slowest. This PR fixes the performance partly:
It's still slower than CBOR, which also isn't doing the fastest it could - so I guess there are still some other low hanging fruits in here ;)