Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactors ArrayIterator to use an internal index instead of Array#shift() #47

Merged
merged 6 commits into from
Mar 26, 2022
Merged

Refactors ArrayIterator to use an internal index instead of Array#shift() #47

merged 6 commits into from
Mar 26, 2022

Conversation

jacoscaz
Copy link
Collaborator

As measured on Node 16.x running on a 13'' 2020 MacBook Pro (Apple Silicon, M1, 16 GB RAM), it takes ~3000ms for the current implementation of ArrayIterator to run through a 200k items array.

Research for #38 and #44 pointed at Array#shift() being a potential bottleneck, thus this refactored version that uses an internal index instead of modifying the array. With this version, the same test terminates in ~8ms. The difference is so big to be almost unbelievable but I've managed to reproduce it multiple times. The following should suffice:

let i = 0;
const arr = new Array(200000).fill(true).map(() => i++);
const now = Date.now();
new ArrayIterator(arr).on('data', () => {}).on('end', () => {
  console.log('elapsed', Date.now() - now);
});

@RubenVerborgh
Copy link
Owner

Hah, great! It does make sense, but we need to clear our buffers every once in a while, right? Now memory will start being flooded pretty soon.

asynciterator.ts Outdated Show resolved Hide resolved
@jacoscaz
Copy link
Collaborator Author

Hah, great! It does make sense, but we need to clear our buffers every once in a while, right? Now memory will start being flooded pretty soon.

I am afraid I don't follow. Or maybe I do, not sure. Isn't the array already in memory as it is passed to ArrayIterator? As long as we make sure to drop our references when the iterator is done/closed/destroyed, peak memory usage should remain the same. It is true that, on average, heavy use of ArrayIterator might lead to more array items in memory but I think it's something worth trading for the 350x - 500x performance boost.

@RubenVerborgh
Copy link
Owner

Right, for ArrayIterator you have a point indeed (it would be more difficult for BufferIterator).

peak memory usage should remain the same

It all depends what the software is doing. If the iterator is short-lived, then yes; but it could be longer lived and could then occupy more than it needs.

Could we maybe quickly have a look at how it impacts performance to, let's say, truncate the array every 64 elements or so? So whenever _currentIndex reaches 64, _buffer is truncated? Might be best of both worlds.

@jacoscaz
Copy link
Collaborator Author

Could we maybe quickly have a look at how it impacts performance to, let's say, truncate the array every 64 elements or so? So whenever _currentIndex reaches 64, _buffer is truncated? Might be best of both worlds.

Ah, good idea! Will tinker and report back.

@RubenVerborgh
Copy link
Owner

^ That same approach could work for BufferedIterator btw (but still curious to see #46).

@coveralls
Copy link

coveralls commented Mar 24, 2022

Coverage Status

Coverage remained the same at 100.0% when pulling 838f389 on jacoscaz:faster-arrayiterator into 857652f on RubenVerborgh:main.

@jacoscaz
Copy link
Collaborator Author

jacoscaz commented Mar 24, 2022

A good tradeoff seems to splice() the array every 64 items (twice as fast as slicing it into a new array). On 200k items, this brings the total time to 61ms. So roughly one order of magnitude less more than the non-splicing version, which still makes for a 50x gain.

@jacoscaz
Copy link
Collaborator Author

jacoscaz commented Mar 24, 2022

Setting the splicing threshold to 256 brings the time closer to 3x rather than 10x the non-splicing version. Perhaps making this configurable via an appropriate constructor parameter would be a good idea? EDIT: that was fairly easy, so I went ahead and did it.

@RubenVerborgh
Copy link
Owner

Great stuff, will have a closer look and merge. Thanks a bunch!

@jeswr
Copy link
Collaborator

jeswr commented Mar 25, 2022

@jacoscaz - IMO it would be useful to still have the option on the ArrayIterator (or a separate iterator) that disables splicing entirely; and have the internal toArray method overridden to just return the internal Array.

I think this would be useful, for instance, in some parts of the Comunica Reasoning components; where I am passing around some rules but need to work with them as arrays in some components.

@jacoscaz
Copy link
Collaborator Author

@jeswr passing Infinity as the splicing threshold practically disables splicing. Good idea on overriding the toArray() method, although I think it would have to return something like this._buffer.slice(this._currentIndex).

@jacoscaz
Copy link
Collaborator Author

@jeswr done!

Copy link
Owner

@RubenVerborgh RubenVerborgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't too sure about passing a setting to ArrayIterator that affected only its internals, so I have instead added the preserve setting. This now brings the following options:

new ArrayIterator(array);                           // Does not modify array, does not truncate
new ArrayIterator(array,      { preserve: false }); // Directly modifies array by truncating every 64 items
new ArrayIterator([...array], { preserve: false }); // Does not modify array, truncates every 64 items

so we can gain additional efficiency by not having to copy the source array.

@jacoscaz
Copy link
Collaborator Author

Niiiiice! Thank you for merging!

@RubenVerborgh
Copy link
Owner

Thanks! This PR is now part of v3.4.0.

@jacoscaz jacoscaz deleted the faster-arrayiterator branch March 26, 2022 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants