Delete/Deprecate `toArray`, `toCanonical`, and `toString` #61

blake-regalia · 2020-10-24T18:35:44Z

toArray and toString are critically memory-intensive calls for any large dataset. We are already operating in ES environments with limited memory and these methods will punish the library when GC panics.

What is the purpose of toArray anyway? The way I see it, there are no legitimate use-cases for an array of quads that cannot be handled by Dataset alone or in tandem with a serializer.

toString is also redundant. This sort of task should be handled by piping to a serializer. We should document all these as examples.

toCanonical is another dangerous 'buffer-everything-into-a-string' method. This should be replaced with a method that returns a new Dataset, and complemented with another method that returns the normalized hash string of a dataset.

The text was updated successfully, but these errors were encountered:

bergos · 2020-10-26T11:16:04Z

If you want to hand over Quads to a library that can handle only arrays, .toArray() was very useful. I remember that I used it for streaming libraries and loops in UI libraries. But with the ES iterator feature, it can be replaced with [...dataset]. 👍 for removing it.

Using .toString() is way simpler than using a serializer. All methods of the Dataset could be moved to a static util class and the same argument could be used to avoid having the Dataset interface at all. Memory usage should be also not an argument against a method. If developers try very hard to crash something, they will find a way. If you know you deal with small datasets, having a debug function or a function that can be somehow used to diff can be very useful. Nobody stops you from writing an implementation that checks the quad count and throws an error if it's too high.

Anyway, the interface is experimental, so I would not care much about it. If more people are serious about changing the interface and remove the experimental status from it, we should set a scope and time frame. Otherwise, I fear discussions will not be very productive or won't happen at all.

blake-regalia · 2020-10-26T19:03:50Z

Memory usage should be also not an argument against a method. If developers try very hard to crash something, they will find a way. If you know you deal with small datasets, having a debug function or a function that can be somehow used to diff can be very useful.

The danger is that users may not know better and simply use .toString() in production without realizing the need for streaming with larger datasets. If you want something like this for debugging, that can be up to impl, but it shouldn't be a method in the spec.

If more people are serious about changing the interface and remove the experimental status from it, we should set a scope and time frame. Otherwise, I fear discussions will not be very productive or won't happen at all.

Sure, I think we can have both. It seems now there are more devs/users interested in the Dataset spec so hopefully we have enough interest to drive productivity.

tpluscode · 2020-11-10T07:48:13Z

I use all of those all the time. The toArray (or indeed more often destructuring) is the only DatasetCore way to quickly create a dataset from another like dataset([...another, ...yetAnother]). So as long as we keep the iterable interface...

I agree with the other two points. toString is a method of the base class anyway, so an implementation is free to do what it wants IMO

And toCanonical is a candidate for a tool package so that this functionality could be used on any DatasetCore. Right now it requires me to use rdf-ext but I would gladly import { toCanonical } from '@rdfjs/to-canonical' which would be universal

blake-regalia mentioned this issue Jul 6, 2021

remove toArray() #64

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete/Deprecate `toArray`, `toCanonical`, and `toString` #61

Delete/Deprecate `toArray`, `toCanonical`, and `toString` #61

blake-regalia commented Oct 24, 2020 •

edited

Loading

bergos commented Oct 26, 2020

blake-regalia commented Oct 26, 2020

tpluscode commented Nov 10, 2020

Delete/Deprecate toArray, toCanonical, and toString #61

Delete/Deprecate toArray, toCanonical, and toString #61

Comments

blake-regalia commented Oct 24, 2020 • edited Loading

bergos commented Oct 26, 2020

blake-regalia commented Oct 26, 2020

tpluscode commented Nov 10, 2020

Delete/Deprecate `toArray`, `toCanonical`, and `toString` #61

Delete/Deprecate `toArray`, `toCanonical`, and `toString` #61

blake-regalia commented Oct 24, 2020 •

edited

Loading