-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large JLSO files #75
Large JLSO files #75
Conversation
- Replaces "objects" dict with "object-names" and "object-nbytes" vectors which tell us how to read raw object bytes after the BSON doc. - Write all object bytes as one contiguous block after writing the BSON doc - Updated tests that previously through an InexactError when writing large objects/docs.
Codecov Report
@@ Coverage Diff @@
## master #75 +/- ##
==========================================
+ Coverage 94.90% 97.57% +2.66%
==========================================
Files 6 6
Lines 157 206 +49
==========================================
+ Hits 149 201 +52
+ Misses 8 5 -3
Continue to review full report at Codecov.
|
Not reviewed properly, but this seems like a sensible way to store the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't for get to add files to the specimens.
https://github.com/invenia/JLSO.jl/tree/master/test/specimens
My main concern would be how does a generic BSON reader know when to stop reading the BSON header?
Do we have a generic BSON reader to test what it does?
Maybe a python one?
The sensible thing to do is to stop when every opening {
has been closed.
But i worry that some might just read the whole thing and then freakout about extra struff.
const READABLE_VERSIONS = semver_spec("1, 2, 3") | ||
const WRITEABLE_VERSIONS = semver_spec("3") | ||
const READABLE_VERSIONS = semver_spec("1, 2, 3, 4") | ||
const WRITEABLE_VERSIONS = semver_spec("3, 4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 having thought about this for a bit, I think it makes sense to support wring v3 for a bit since it could be useful to allow porting data from v4 back to v3 for if for some reason you are constrained to reading it on a system that doesn't yet support readign v4 (i.e .that can't use this version of JLSO)
@@ -124,6 +124,7 @@ function _upgrade_env(pkgs::Dict{String, VersionNumber}) | |||
try | |||
mktempdir() do tmp | |||
Pkg.activate(tmp) | |||
isempty(pkgs) && return _env() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Necessary to get tests to pass on julia 1.5 / nightly. I can move it to a separate pull request though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have no idea what this is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The require_not_empty
was added to Pkg.jl for 1.5. This creates an early exit condition to work around that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This otherwise errors on the Pkg.add
call in the catch
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This idea is all sound.
I can't think of any additional feedback I might need to give later.
Approved
Cool, thanks for reviewing @oxinabox I'll make those changes and merge. |
The BSON header has it's own EOF byte that should tell any reader to stop reading the doc at the end of the header. That assumes the reader is following the BSON spec though. I'll add some tests using the python bson package since that seems like the easiest options for julia interop. |
They don't need to be CI tests. |
…likely hit OOM errors.
… to extract object bytes in python-bson
Okay, yeah. I even have a little python snippet for extract raw object data from the file in python using the |
47cd4f7
to
be58e4c
Compare
Closing #21
Pros:
read
/write
functions. (e.g., v3 and v4 input/output is the same)Cons: