Refactors ArrayIterator to use an internal index instead of Array#shift() #47

jacoscaz · 2022-03-24T17:02:54Z

As measured on Node 16.x running on a 13'' 2020 MacBook Pro (Apple Silicon, M1, 16 GB RAM), it takes ~3000ms for the current implementation of ArrayIterator to run through a 200k items array.

Research for #38 and #44 pointed at Array#shift() being a potential bottleneck, thus this refactored version that uses an internal index instead of modifying the array. With this version, the same test terminates in ~8ms. The difference is so big to be almost unbelievable but I've managed to reproduce it multiple times. The following should suffice:

let i = 0;
const arr = new Array(200000).fill(true).map(() => i++);
const now = Date.now();
new ArrayIterator(arr).on('data', () => {}).on('end', () => {
  console.log('elapsed', Date.now() - now);
});

…ft()

RubenVerborgh · 2022-03-24T17:22:22Z

Hah, great! It does make sense, but we need to clear our buffers every once in a while, right? Now memory will start being flooded pretty soon.

asynciterator.ts

jacoscaz · 2022-03-24T17:32:01Z

Hah, great! It does make sense, but we need to clear our buffers every once in a while, right? Now memory will start being flooded pretty soon.

I am afraid I don't follow. Or maybe I do, not sure. Isn't the array already in memory as it is passed to ArrayIterator? As long as we make sure to drop our references when the iterator is done/closed/destroyed, peak memory usage should remain the same. It is true that, on average, heavy use of ArrayIterator might lead to more array items in memory but I think it's something worth trading for the 350x - 500x performance boost.

RubenVerborgh · 2022-03-24T17:37:06Z

Right, for ArrayIterator you have a point indeed (it would be more difficult for BufferIterator).

peak memory usage should remain the same

It all depends what the software is doing. If the iterator is short-lived, then yes; but it could be longer lived and could then occupy more than it needs.

Could we maybe quickly have a look at how it impacts performance to, let's say, truncate the array every 64 elements or so? So whenever _currentIndex reaches 64, _buffer is truncated? Might be best of both worlds.

jacoscaz · 2022-03-24T17:39:31Z

Could we maybe quickly have a look at how it impacts performance to, let's say, truncate the array every 64 elements or so? So whenever _currentIndex reaches 64, _buffer is truncated? Might be best of both worlds.

Ah, good idea! Will tinker and report back.

RubenVerborgh · 2022-03-24T17:41:28Z

^ That same approach could work for BufferedIterator btw (but still curious to see #46).

coveralls · 2022-03-24T17:49:27Z

Coverage remained the same at 100.0% when pulling 838f389 on jacoscaz:faster-arrayiterator into 857652f on RubenVerborgh:main.

jacoscaz · 2022-03-24T17:50:35Z

A good tradeoff seems to splice() the array every 64 items (twice as fast as slicing it into a new array). On 200k items, this brings the total time to 61ms. So roughly one order of magnitude ~~less~~ more than the non-splicing version, which still makes for a 50x gain.

…e up memory

jacoscaz · 2022-03-24T18:03:36Z

Setting the splicing threshold to 256 brings the time closer to 3x rather than 10x the non-splicing version. Perhaps making this configurable via an appropriate constructor parameter would be a good idea? EDIT: that was fairly easy, so I went ahead and did it.

RubenVerborgh · 2022-03-24T18:27:52Z

Great stuff, will have a closer look and merge. Thanks a bunch!

jeswr · 2022-03-25T05:00:33Z

@jacoscaz - IMO it would be useful to still have the option on the ArrayIterator (or a separate iterator) that disables splicing entirely; and have the internal toArray method overridden to just return the internal Array.

I think this would be useful, for instance, in some parts of the Comunica Reasoning components; where I am passing around some rules but need to work with them as arrays in some components.

jacoscaz · 2022-03-25T08:15:58Z

@jeswr passing Infinity as the splicing threshold practically disables splicing. Good idea on overriding the toArray() method, although I think it would have to return something like this._buffer.slice(this._currentIndex).

…ray#slice()

jacoscaz · 2022-03-25T08:49:41Z

@jeswr done!

RubenVerborgh

I wasn't too sure about passing a setting to ArrayIterator that affected only its internals, so I have instead added the preserve setting. This now brings the following options:

new ArrayIterator(array);                           // Does not modify array, does not truncate
new ArrayIterator(array,      { preserve: false }); // Directly modifies array by truncating every 64 items
new ArrayIterator([...array], { preserve: false }); // Does not modify array, truncates every 64 items

so we can gain additional efficiency by not having to copy the source array.

jacoscaz · 2022-03-26T16:15:48Z

Niiiiice! Thank you for merging!

RubenVerborgh · 2022-03-26T16:37:51Z

Thanks! This PR is now part of v3.4.0.

refactors ArrayIterator to use an internal index instead of Array#shi…

8814f67

…ft()

RubenVerborgh reviewed Mar 24, 2022

View reviewed changes

asynciterator.ts Outdated Show resolved Hide resolved

drops usage of local variables with underscore prefix

54c9903

adds automatic buffer splicing every 64 items to ArrayIterator to fre…

271a5de

…e up memory

makes the splicing threshold configurable

d08aade

RubenVerborgh self-assigned this Mar 24, 2022

jacoscaz mentioned this pull request Mar 24, 2022

Simpler and faster filtering, mapping, skipping and limiting #48

Closed

adds a more efficient toArray() override in ArrayIterator based on Ar…

2a3b62d

…ray#slice()

Review.

838f389

RubenVerborgh approved these changes Mar 26, 2022

View reviewed changes

RubenVerborgh merged commit 34ea080 into RubenVerborgh:main Mar 26, 2022

jacoscaz mentioned this pull request Mar 26, 2022

Backs BufferedIterator with a linked-list instead of an array #46

Merged

jacoscaz deleted the faster-arrayiterator branch March 26, 2022 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactors ArrayIterator to use an internal index instead of Array#shift() #47

Refactors ArrayIterator to use an internal index instead of Array#shift() #47

jacoscaz commented Mar 24, 2022

RubenVerborgh commented Mar 24, 2022

jacoscaz commented Mar 24, 2022

RubenVerborgh commented Mar 24, 2022

jacoscaz commented Mar 24, 2022

RubenVerborgh commented Mar 24, 2022

coveralls commented Mar 24, 2022 •

edited

Loading

jacoscaz commented Mar 24, 2022 •

edited

Loading

jacoscaz commented Mar 24, 2022 •

edited

Loading

RubenVerborgh commented Mar 24, 2022

jeswr commented Mar 25, 2022

jacoscaz commented Mar 25, 2022

jacoscaz commented Mar 25, 2022

RubenVerborgh left a comment

jacoscaz commented Mar 26, 2022

RubenVerborgh commented Mar 26, 2022

Refactors ArrayIterator to use an internal index instead of Array#shift() #47

Refactors ArrayIterator to use an internal index instead of Array#shift() #47

Conversation

jacoscaz commented Mar 24, 2022

RubenVerborgh commented Mar 24, 2022

jacoscaz commented Mar 24, 2022

RubenVerborgh commented Mar 24, 2022

jacoscaz commented Mar 24, 2022

RubenVerborgh commented Mar 24, 2022

coveralls commented Mar 24, 2022 • edited Loading

jacoscaz commented Mar 24, 2022 • edited Loading

jacoscaz commented Mar 24, 2022 • edited Loading

RubenVerborgh commented Mar 24, 2022

jeswr commented Mar 25, 2022

jacoscaz commented Mar 25, 2022

jacoscaz commented Mar 25, 2022

RubenVerborgh left a comment

Choose a reason for hiding this comment

jacoscaz commented Mar 26, 2022

RubenVerborgh commented Mar 26, 2022

coveralls commented Mar 24, 2022 •

edited

Loading

jacoscaz commented Mar 24, 2022 •

edited

Loading

jacoscaz commented Mar 24, 2022 •

edited

Loading