Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up (pre)compile and load times #681

Merged
merged 2 commits into from
Sep 21, 2020
Merged

Speed up (pre)compile and load times #681

merged 2 commits into from
Sep 21, 2020

Conversation

jmert
Copy link
Contributor

@jmert jmert commented Sep 20, 2020

This PR does two primary things with the goal of reducing load time:

  1. The three global Dicts that are rehashed on init are eliminated, replaced with if-else chains at their use or in new internal helper functions. This was motivated by looking at the output of @snoopi using HDF5 (where I started Julia with julia --compiled-modules=no) and noticed that many of the methods near the end with largest times were Dict methods.

One minor advantage of doing this is that property names can now be valid for multiple HDF5 classes. For instance, the __init__ code had to call h5p_set_char_encoding for ASCII_ATTRIBUTE_PROPERTIES and UTF8_ATTRIBUTE_PROPERTIES because :char_encoding was already mapped to H5P_LINK_CREATE so couldn't be used for these two properties which are H5P_ATTRIBUTE_CREATE properties. After replacing the Dict lookup, though, the _prop_set! function can have a :char_encoding case for H5P_ATTRIBUTE_CREATE without any conflicts.

To evaluate the effectiveness of this change, I've timed loading (using HDF5) with a missing precompiled cache, loading with a valid precompiled cache, and the time to run the entire test suite (with the precompiled cache active). The times comparing current master versus the first commit from this PR (run with near-latest Julia, average and std over 10 runs) show:

 master precompile: 3.145 ± 0.0295
no dict precompile: 1.927 ± 0.0164

 master pkg load: 0.706 ± 0.0084
no dict pkg load: 0.564 ± 0.0158

 master pkg test: 32.016 ± 0.9132
no dict pkg test: 31.306 ± 0.4119
  1. The second commit enables a lower optimization level for the entire module in Julia 1.5+. The motivation here is that a lot of functions do not type infer, so we might as well tell the compiler to not try too hard.

Running the same timing tests again:

 master precompile: 3.145 ± 0.0295
no dict precompile: 1.927 ± 0.0164
low opt precompile: 1.860 ± 0.0156

 master pkg load: 0.706 ± 0.0084
no dict pkg load: 0.564 ± 0.0158
low opt pkg load: 0.502 ± 0.0123

 master pkg test: 32.016 ± 0.9132
no dict pkg test: 31.306 ± 0.4119
low opt pkg test: 27.085 ± 0.4015

  • I've noticed at least one case where type inference improves despite the lower optimization level: getindex(::Union{HDF5File, HDF5Group}, ::String) used to infer as Any but with this PR is inferred returning an HDF5Object.

  • My hacked together timing script can be found in this gist

@jmert jmert requested a review from musm September 20, 2020 20:43
src/HDF5.jl Outdated
native_size === Csize_t(2) ? (return Int16) :
native_size === Csize_t(4) ? (return Int32) :
native_size === Csize_t(8) ? (return Int64) :
throw(KeyError(class_id, is_signed, native_size))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this line (and similar in the other branches) is actually necessary — the final check size-8 case in principle could just be the else-case of the size-4 ternary. Thoughts?

(H5T_FLOAT, nothing, Csize_t(4)) => Float32,
(H5T_FLOAT, nothing, Csize_t(8)) => Float64,
)
function _hdf5_type_map(class_id, is_signed, native_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

fun, propids = hdf5_obj_open[objtype]
props = [p_create(prop; pv...) for prop in propids]
obj = fun(parent, path, props...)
if objtype == H5I_DATASET
Copy link
Member

@musm musm Sep 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
I also bet this is just simply also far faster

UTF8_ATTRIBUTE_PROPERTIES[] = p_create(H5P_ATTRIBUTE_CREATE)
h5p_set_char_encoding(UTF8_ATTRIBUTE_PROPERTIES[].id, H5T_CSET_UTF8)

rehash!(hdf5_type_map, length(hdf5_type_map.keys))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay, glad we can get rid of this now

@musm
Copy link
Member

musm commented Sep 20, 2020

I've noticed at least one case where type inference improves despite the lower optimization level: getindex(::Union{HDF5File, HDF5Group}, ::String) used to infer as Any but with this PR is inferred returning an HDF5Object.

It might then just be worth it to add type annotations where necessary to help the compiler with inference in these cases, instead of (or in addition to) @eval Base.Experimental.@optlevel 1 .

I kind of think adding the line

    @eval Base.Experimental.@optlevel 1

Is not that elegant, given it is current experimental status, however the precompilation latency reduction (10% faster) and improved inference are very welcome.

Overall, very nice PR. Still reviewing some final changes, but 👍

@jmert
Copy link
Contributor Author

jmert commented Sep 20, 2020

I've noticed at least one case where type inference improves despite the lower optimization level: getindex(::Union{HDF5File, HDF5Group}, ::String) used to infer as Any but with this PR is inferred returning an HDF5Object.

It might then just be worth it to add type annotations where necessary to help the compiler with inference in these cases, instead of (or in addition to) @eval Base.Experimental.@optlevel 1 .

Yeah, doing some more systematic review of types and inference's understanding of the code is something worth doing but wasn't as easy of low-hanging fruit. And just because I was maybe a bit unclear, the inference change of getindex is due to rewriting it to not use a dictionary and is independent of the @optlevel 1 addition (i.e. we'll get it even if the second commit is dropped).

@@ -978,19 +994,62 @@ end
t_commit(parent::Union{HDF5File, HDF5Group}, path::String, dtype::HDF5Datatype) = t_commit(parent, path, dtype, p_create(H5P_LINK_CREATE))

a_create(parent::Union{HDF5File, HDF5Object}, name::String, dtype::HDF5Datatype, dspace::HDF5Dataspace) = HDF5Attribute(h5a_create(checkvalid(parent).id, name, dtype.id, dspace.id), file(parent))

function _prop_set!(p::HDF5Properties, name::Symbol, val, check::Bool = true)
Copy link
Member

@musm musm Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check arg really needed, It seems a bit superfluous? I think the rationale is that we can squeeze a bit of perf without having to error check in internal use, but perhaps the compiler is good enough that we don't need such fine grained tuning.

Perhaps these should all be written like (not sure if that helps the compiler guess the right branch?)

if class == H5P_FILE_CREATE

elseif class == H5P_FILE_ACCESS

elseif class H5P_GROUP_CREATE
....

else
    error(....
end

OTOH the way to code is written as is, is pretty clear.

Copy link
Member

@musm musm Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT:

Actually NVM I don't think it's that feasible given the way the code is structured as is, and would remove some important error checking in the if-clauses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check arg really needed, It seems a bit superfluous?

The reason I added it was to maintain the current behavior — p_create does not error even if you pass it properties that are not appropriate for a particular property class, while setindex! did (by way of just throwing when the underlying h5p_set_* function errored). The two functions now set false and true, respectively, to maintain that.

src/HDF5.jl Outdated Show resolved Hide resolved
src/HDF5.jl Outdated Show resolved Hide resolved
Co-authored-by: Mustafa M <[email protected]>
@musm musm merged commit d1e542a into JuliaIO:master Sep 21, 2020
@musm
Copy link
Member

musm commented Sep 21, 2020

Awesome !

@jmert jmert deleted the no_dicts branch September 21, 2020 17:46
@musm
Copy link
Member

musm commented Sep 21, 2020

I kinda wonder if we should just disable the integration tests, it's pretty annoying to see red X on PRs that pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants