Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe change the default? #97

Open
PallHaraldsson opened this issue Feb 17, 2021 · 4 comments
Open

Maybe change the default? #97

PallHaraldsson opened this issue Feb 17, 2021 · 4 comments

Comments

@PallHaraldsson
Copy link

PallHaraldsson commented Feb 17, 2021

"or in the native Julia serialization format (default)" caught my eye, and I've been meaning to post to discourse warning/clarifying it/asking if my understanding it's dangerous is still true.

Some time back (not long ago), I did see a thread on discourse on OrderedDict having an issue with serialization, it not being supported across even minor versions (Dict also has some issues with serialization). It might be ok, but for some later version it failed, so I wouldn't trust it going forward, as it doesn't make a claim it should be ok. Also even if it did, I sometimes get packages downgraded...

I was thinking what should be the alternative, to serialize general objects. Arrow, JLD etc. Or this package? I would prefer something that's same by default. Safe and fast... but that seem to be opposite goals.

@oxinabox
Copy link
Member

Safe and General to arbitary types is impossible AFAICT.

Every package that accepts reading arbitary types eventually falls back to the julia serializer, or internal code that is behind the serializer. Includign JLD and BSON.jl
Arrow doesn't allow arbitrary types.

I recommend not serializing arbitary types if you can avoid it.
Then you can use JSON, or Arrow, or CSV.
It would be nice if we had a plain BSON package, but BSON.jl is an extention to BSON that can also handle abitary types.

I am tempted to close this issue as nonactionable.
Our alternative to the Julia Serializer is BSON which is exactly as flawed.

@PallHaraldsson
Copy link
Author

Safe and General to arbitrary types is impossible AFAICT.

I haven't really looked into why impossible, it seems then best to keep the serialize (as it should always work?), and fix the most important package/types e.g. OrderedDict. Is it possible for at least it to have a guarantee?

[FYI: I changed your quote to have the missing r in arbitrary. Just a friendly pointer since you mistyped twice.]

@rofinn
Copy link
Member

rofinn commented Feb 17, 2021

I haven't really looked into why impossible

It's impossible because to guarantee safety, you'd need to:

  1. Manually serialize to a static set of primitive types that will never change for the lifetime of the file format (e.g., CSV, JSON, TOML) and
  2. Have types/structs version their serialization calls such that they can handle schema changes (i.e., fields added/removed/renamed). This is the same is if you did a schema change on the objects in a JSON file or renamed the columns in a CSV.

it seems then best to keep the serialize (as it should always work?)

Do you mean keep the current default of BSON + gzipped julia serialization?

fix the most important package/types e.g. OrderedDict. Is it possible for at least it to have a guarantee?

It might help if you linked to the specific thread and included a stacktrace because neither JLSO.jl or BSON.jl explicitly depend on OrderedDict for their internal representations. I don't think it should be the responsibility of this package to dictate the serializability of other arbitrary types in the julia ecosystem. We're happy to support more serialization formats (e.g., Arrow, CSV, JLD2) though, but they'll all have their own strengths and weaknesses (safety vs generalizability), as @oxinabox has already pointed out.

@PallHaraldsson
Copy link
Author

PallHaraldsson commented Feb 17, 2021

Do you mean keep the current default of BSON + gzipped julia serialization?

No, because I thought BSON was non-default. I had BSON in mind when opening the issue, as a new default but if unworkable, then basically changed my mind, with "keep the serialize" meaning statue quo here, and the fix needing to be elsewhere.

The thread I had in mind with "I did [see] a thread" (fixed missing "see"):

https://discourse.julialang.org/t/orderedcollections-not-binary-compatible-with-prior-version/54540

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants