Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete/Deprecate toArray, toCanonical, and toString #61

Open
blake-regalia opened this issue Oct 24, 2020 · 3 comments
Open

Delete/Deprecate toArray, toCanonical, and toString #61

blake-regalia opened this issue Oct 24, 2020 · 3 comments

Comments

@blake-regalia
Copy link
Contributor

blake-regalia commented Oct 24, 2020

toArray and toString are critically memory-intensive calls for any large dataset. We are already operating in ES environments with limited memory and these methods will punish the library when GC panics.

What is the purpose of toArray anyway? The way I see it, there are no legitimate use-cases for an array of quads that cannot be handled by Dataset alone or in tandem with a serializer.

toString is also redundant. This sort of task should be handled by piping to a serializer. We should document all these as examples.

toCanonical is another dangerous 'buffer-everything-into-a-string' method. This should be replaced with a method that returns a new Dataset, and complemented with another method that returns the normalized hash string of a dataset.

@bergos
Copy link
Member

bergos commented Oct 26, 2020

If you want to hand over Quads to a library that can handle only arrays, .toArray() was very useful. I remember that I used it for streaming libraries and loops in UI libraries. But with the ES iterator feature, it can be replaced with [...dataset]. 👍 for removing it.

Using .toString() is way simpler than using a serializer. All methods of the Dataset could be moved to a static util class and the same argument could be used to avoid having the Dataset interface at all. Memory usage should be also not an argument against a method. If developers try very hard to crash something, they will find a way. If you know you deal with small datasets, having a debug function or a function that can be somehow used to diff can be very useful. Nobody stops you from writing an implementation that checks the quad count and throws an error if it's too high.

Anyway, the interface is experimental, so I would not care much about it. If more people are serious about changing the interface and remove the experimental status from it, we should set a scope and time frame. Otherwise, I fear discussions will not be very productive or won't happen at all.

@blake-regalia
Copy link
Contributor Author

Memory usage should be also not an argument against a method. If developers try very hard to crash something, they will find a way. If you know you deal with small datasets, having a debug function or a function that can be somehow used to diff can be very useful.

The danger is that users may not know better and simply use .toString() in production without realizing the need for streaming with larger datasets. If you want something like this for debugging, that can be up to impl, but it shouldn't be a method in the spec.

If more people are serious about changing the interface and remove the experimental status from it, we should set a scope and time frame. Otherwise, I fear discussions will not be very productive or won't happen at all.

Sure, I think we can have both. It seems now there are more devs/users interested in the Dataset spec so hopefully we have enough interest to drive productivity.

@tpluscode
Copy link

I use all of those all the time. The toArray (or indeed more often destructuring) is the only DatasetCore way to quickly create a dataset from another like dataset([...another, ...yetAnother]). So as long as we keep the iterable interface...

I agree with the other two points. toString is a method of the base class anyway, so an implementation is free to do what it wants IMO

And toCanonical is a candidate for a tool package so that this functionality could be used on any DatasetCore. Right now it requires me to use rdf-ext but I would gladly import { toCanonical } from '@rdfjs/to-canonical' which would be universal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants