-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimization: Reduce overhead of SimpleTransformIterator #44
Comments
Indeed, chained buffering steps might explain the slowdowns I’ve seen. Normally I would expect performance to slow down linearly with the number of chained transforms but often I see slowdowns that seem somewhat exponential. I had been under the impression that methods such as However, I do not have hard numbers at hand. @jeswr have you done some profiling on this? |
@jeswr Do you think think the slowdown is caused by the complexity of the code in |
@rubensworks I'd suspect you're correct in that the intermediary buffers are the main overhead. I was thinking that one way around this is to have a 'chained transform' which applies multiple map/filter/transform operations in one go, so long as each part of that chain is synchronous - is this equivalent to what you mean by disabling internal buffers? |
@jacoscaz - no I have not done profiling, it was just an observation I made when digging through the code when working on a PR I made earlier |
Would this not require knowing what those transformations look like ahead of time? Unless, perhaps, one were to have all intermediate actors return a transform function and have the final actor concatenate these. However, if all intermediate steps were to be implemented as iterators without internal buffering as you suggested, provided an initial buffered iterator for I/O queries (i.e. reading from a file), the resulting performance should not be too far from that of a pure concatenation of functions while allowing to keep the current internal structure. We should come up with a couple of examples that clearly demonstrate whether there’s a real issue here or not. One possible way of doing this would be to manually implement a no-buffering map iterator and profile it against the current |
I was thinking that methods like
That sounds great but I'd wait for #43 to be merged as there may introduce some small performance changes |
I suspect that the concatenation mechanism might be a non-trivial one, and it would have to be instantiated even for single-op iterators unless we were to make it a lazy one, which would make it more complicated. Nonetheless, I think there’s definitely something to be said for combining operations. For example, could Comunica benefit from a In any case, we should probably look at these as orthogonal issues:
Both definitely sound worth investigating IMHO.
Seems reasonable! I may start setting up a simple benchmark and then pause and wait for that PR to be merged. |
Is the My goal for it was to be used for simple cases—and definitely not for chaining. For chaining, you want to define your own sync or async transform iterator. |
Absolutely, thanks for finding this. We probably want:
|
@jeswr could you have a look at this gist of mine? https://gist.github.com/jacoscaz/e20812991714092dbb20cfd432d1bf00 Either I got something wrong or there's virtually no performance difference between |
To summarize the results I have in the following comments
Combining all of these factors can result in a speed-up of 100x or greater when doing 50 This gist includes some experiments and results that are discussed in the following comments. @jacoscaz - The vast majority of your time is spent on the array iterator because of #38; I believe node does not implement similar optimizations to firefox. (@RubenVerborgh for reference; my machine took about 5x times longer to run the implementation @jacoscaz had using the array iterator). I get the following results for the code I have attached below
if you do 50 mappings the results become
now here is where it gets more interesting - if we use the array iterator for 50 mappings we get
So this means that with a combination of optimizing the import {AsyncIterator, range} from 'asynciterator';
class MappingIterator<I, O> extends AsyncIterator<O> {
constructor(source: AsyncIterator<I>, map: (item: I) => O) {
super();
source.on('end', () => {
this.close();
});
source.on('readable', () => {
this.emit('readable');
});
let item: I | null;
this.read = (): O | null => {
return ((item = source.read()) !== null)
? map(item)
: null;
};
}
}
const time = (createStream: (max: number) => AsyncIterator<number>): Promise<number> => {
const max = 200000
const str = createStream(max);
let count = 0;
return new Promise((resolve, reject) => {
const now = Date.now();
str.on('data', () => {
count += 1;
})
.on('end', () => {
const then = Date.now();
if (count != max + 1) {
reject(new Error('Bad count'));
return;
}
resolve(then - now);
});
})
}
const main = async () => {
const mapMethodTime = await time((max) => {
let iterator: AsyncIterator<number> = range(0, max);
for (let i = 0; i < 5; i++) {
iterator = iterator.map(item => item);
}
return iterator;
});
const mappingIteratorTime = await time((max) => {
let iterator: AsyncIterator<number> = range(0, max);
for (let i = 0; i < 5; i++) {
iterator = new MappingIterator(iterator, item => item);
}
return iterator;
});
console.log(`ArrayIterator#map(): ${mapMethodTime}`);
console.log(`MappingIterator: ${mappingIteratorTime}`);
};
main().catch((err) => {
console.error(err);
process.exit(1);
}); Environment:
Run on Dell XPS15 with 32G RAM |
With the PR (#45) I have made I get; for the
and for the
As a summary, using the
To do this I used the following class MappingIterator<I, O> extends AsyncIterator<O> {
constructor(source: AsyncIterator<I>, map: (item: I) => O) {
super();
source.on('end', () => {
this.close();
});
source.on('readable', () => {
this.emit('readable');
});
let item: I | null;
this.read = (): O | null => {
const item = source.read();
if (item === null) {
// @ts-ignore
if (this._state === CLOSED)
this._end();
return null
}
return map(item);
};
}
} |
I've also implemented a ArrayIterator#filter(): 72
FilterIterator: 13 class FilterIterator<I> extends AsyncIterator<I> {
constructor(source: AsyncIterator<I>, filter: (item: I) => boolean) {
super();
source.on('readable', () => {
this.emit('readable');
});
let item: I | null;
this.read = (): I | null => {
let item;
while ((item = source.read()) !== null) {
if (filter(item))
return item;
}
// @ts-ignore
if (source._state === ENDED || source._state === CLOSED)
this._end();
return null;
};
}
} |
Sorry to keep spamming this thread - I've also implemented a
import { CLOSED, ENDED } from 'asynciterator';
import { AsyncIterator, ArrayIterator, range } from './asynciterator';
type Transform = {
type: 'filter';
function: (elem: any) => boolean
} | {
type: 'map';
function: (elem: any) => any
}
class CompositeMapFilter<T> extends AsyncIterator<T> {
private tranforms: Transform[] = [];
constructor(private source: AsyncIterator<T>) {
super();
source.on('readable', () => {
this.emit('readable');
});
}
read(): T | null {
let item = this.source.read();
for (let i = 0; i < this.tranforms.length; i += 1) {
const transform = this.tranforms[i];
switch (transform.type) {
case 'map':
item = transform.function(item);
case 'filter': {
if (!transform.function(item)) {
item = this.source.read();
if (item === null)
break;
else
i = 0
}
}
}
}
// @ts-ignore
if (item === null && this.source._state === ENDED) {
this._end();
}
return item;
}
// @ts-ignore
filter(filter: (item: T) => boolean, self?: any): CompositeMapFilter<T> {
this.tranforms.push({ type: 'filter', function: filter });
return this;
}
// @ts-ignore
map(map: (item: T) => T): CompositeMapFilter<T> {
this.tranforms.push({ type: 'map', function: map });
return this;
}
}
class MappingIterator<I, O> extends AsyncIterator<O> {
constructor(source: AsyncIterator<I>, map: (item: I) => O) {
super();
source.on('readable', () => {
this.emit('readable');
});
let item: I | null;
this.read = (): O | null => {
const item = source.read();
if (item === null) {
// @ts-ignore
if (source._state === ENDED || source._state === CLOSED)
this._end();
return null
}
return map(item);
};
}
}
class FilterIterator<I> extends AsyncIterator<I> {
constructor(source: AsyncIterator<I>, filter: (item: I) => boolean) {
super();
source.on('readable', () => {
this.emit('readable');
});
let item: I | null;
this.read = (): I | null => {
let item;
while ((item = source.read()) !== null) {
if (filter(item))
return item;
}
// @ts-ignore
if (source._state === ENDED || source._state === CLOSED)
this._end();
return null;
};
}
}
const generateArr = (): number[] => {
let i = 0;
return new Array(200000)
.fill(true)
.map(() => i++);
};
const time = (createStream: (arr: number[]) => AsyncIterator<number> | CompositeMapFilter<number>): Promise<number> => {
const arr = generateArr();
const str = createStream(arr);
let count = 0;
return new Promise((resolve, reject) => {
const now = Date.now();
str.on('data', () => {
count += 1;
})
.on('end', () => {
const then = Date.now();
// console.log(count, arr.length / 2, arr)
if (count != arr.length / 2) {
// reject(new Error('Bad count'));
// return;
}
resolve(then - now);
});
})
}
const main = async () => {
const mapMethodTime = await time((arr) => {
let iterator: AsyncIterator<number> = range(0, arr.length - 1);
for (let i = 0; i < 50; i++) {
iterator = iterator.filter(item => item % 2 === 0);
iterator = iterator.map(item => item);
}
return iterator;
});
const mappingIteratorTime = await time((arr) => {
let iterator: AsyncIterator<number> = range(0, arr.length - 1);
for (let i = 0; i < 50; i++) {
iterator = new FilterIterator(iterator, item => item % 2 === 0);
iterator = new MappingIterator(iterator, item => item);
}
return iterator;
});
const compIteratorTime = await time((arr) => {
let iterator: CompositeMapFilter<number> = new CompositeMapFilter(range(0, arr.length - 1));
for (let i = 0; i < 50; i++) {
iterator = iterator.filter(item => item % 2 === 0);
iterator = iterator.map(item => item);
}
return iterator;
});
console.log(`ArrayIterator#map(): ${mapMethodTime}`);
console.log(`MappingIterator: ${mappingIteratorTime}`);
console.log(`CompositeIterator: ${compIteratorTime}`);
};
main().catch((err) => {
console.error(err);
process.exit(1);
}); |
You can see a further 50-100% performance improvement by 'pre-compiling' the |
@jeswr if all spam were like this we'd all be much happier about spam. Seriously, awesome work.
Although I was not expecting The latest revision of my gist at https://gist.github.com/jacoscaz/e20812991714092dbb20cfd432d1bf00/e6cbf3caab930e88edb62f34562619603572c42b has a
Can confirm. When going from calling
Haven't tested this yet but I have no reason to object.
I can see the benefits of doing this but I'm wary of the fact that it breaks iterator immutability, which can be a source of bugs. I would still try to reap as much as possible by offering a
Indeed, performance-wise this should turn out to be an extremely productive and effective investment of time. |
Impressive - and I can't see any problems with it; though I wonder why it also seems to be a much greater improvement than using
Makes sense - I think there should be ways to give the feel of Immutability to the API (or at the very least throw meaningful errors for devs when they misuse a Mutable AsyncIterator) while using mutations like this under the hood for performance. This is probably more appropriate in a separate package; but one option I was thinking of is as follows PlaceholderIterator extends AsyncIterator<never> {
error() {
throw new Error('You are trying to use methods on an iterator - which has been destroyed by another transform operation')
}
map = this.error;
read = this.error;
...
}
const placeHolderIterator = new PlaceholderIterator();
class ComposeMapFilterIterator extends AsyncIterator<T> {
map(fn) {
this.transformers.push(fn);
const that = this;
this = placeHolderIterator
return that;
}
} So this means if any potential usage that would cause bugs such as using Furthermore the In fact I'm inclined to argue that such an implementation may lead to fewer bugs in some cases where devs may be trying to read data from an iterator that they shouldn't be. |
@jeswr as I was investigating this, I came up with the same idea as #38 (comment) and had a go at it in https://github.com/jacoscaz/AsyncIterator/blob/42311b68988a971fc072c896100d7082b4162470/asynciterator.ts#L744-L783 . I'll make a PR later but I just realized we've put quite a lot of things on the table and we should probably make a plan of action, including what should go into
Tagging @rubensworks and @RubenVerborgh to keep everyone in the loop. |
Indeed - but its also probably worth tracking https://bugs.chromium.org/p/v8/issues/detail?id=12730 in case the V8 team solves this upstream for us.
Nice!
Standalone it is only approx a ~2x improvement - IIRC the 100x figure came from compounding most of the factors above
I'd say this should be a new library of its own e.g. 'asynciterator-compose.js' |
Makes sense! In theory, this should be mitigated by #46 although my testing has shown a significantly smaller performance gain than I had expected. @jeswr do you have time to do some testing of that branch on your use cases? Still, doing away with buffering entirely would be a good idea, given the gains seen in #48 . I think we could port the source wrapping part of |
I've not done any benchmarking for
Sounds good to me |
Happy to have it here, really. Could be a separate file though (and maybe we could also split the code into more separate files in general). |
As discussed with @jacoscaz the proposed timeline for resolving this issue is (1) Get full coverage on current state of #57 . Perhaps minus the (2) Merge #45, because this resolves a lot of issues I had for the (3) Address SyncUnionIterator (4) Address Cloning Pinging @RubenVerborgh |
In addition to the above plan shared by @jeswr , if nobody else needs these changes to live within a single branch for testing purposes I would suggest breaking the work done across the following PRs, to be refined and merged in order:
When taken all together, these changes will make for dramatic performance gains. I have one use case, which I unfortunately cannot share ATM, where things become 1200x faster simply by switching to 1 + 2 + 4. |
Thanks for all the work and please proceed, with the following caveats:
|
@RubenVerborgh is it the overall approach that does not convince you or is it the fact that we might have pushed it too far with things like |
At the moment, I can't see the forest for the trees; the code needs a bit of love still. And some bits might take it too far indeed. |
A few comments to keep everyone in the loop and for posterity... While working on 5. Faster cloning, I've realized that my assumptions about the relative performance of @jeswr has also been working on a non-breaking alternative to #54 in #65 for faster unions which also optimizes All this said, I think in order to close this issue and reap the benefits in Comunica we're looking at the following:
Also tagging @rubensworks . |
That is expected; but what we need to know is the behavior for repeated pushes and pulls, cfr. #60 (comment) |
As far as my understanding goes and limited to cloning, nothing is ever pulled from the backing buffer as the entire history starting from when the first clone is created (EDIT) is maintained for future cloned iterators to go through; we only care about pushes done at AsyncIterator/asynciterator.ts Line 1845 in fec14f4
Nonetheless, I've done some work on a |
But that's because no data is ever pulled from it. |
Yes, precisely! What I meant in #44 (comment) is that, given that |
Update for posterity and to keep track of the general progress on this. Tagging @rubensworks @jeswr @RubenVerborgh . Open PRs that we're working on, to be reviewed / merged / rebased in order:
|
We're halfway through! Faster unions are next. @RubenVerborgh @jeswr @rubensworks do you think we can get these merged by the end of august? |
@jacoscaz 4 might be a significant amount of work to rebase and check for correctness; so I wouldn't bet on that by end of August. The other two should be much less work but will depend on everyones schedules. |
My plan is to wrap up AsyncIterator v3.x soonish; the moment we start tackling #45, we're in the v4.x territory. |
@RubenVerborgh @jeswr perhaps it would make sense to do faster unions and the synchronous transform iterator first, wrapping up 3.x, and leave end on read last. Would that work for you guys? |
I agree with the strategy - will try and wrap up work on the faster unions today |
Update for posterity and to keep track of the general progress. Tagging @rubensworks @jeswr @RubenVerborgh . Open PRs that we're working on, to be reviewed / merged / rebased in order: non-breaking, to be released in minor 3.x versions:
breaking, to be released in 4.0.0:
EDITED 2022-08-26: faster unions are now breaking changes, moved to 4.x |
I kinda lost track of where things are. Is the v4 PR still being worked on? |
Somewhat fell off my radar for a bit; I do have most of the code for a new version locally but there is still work do be done for the tests. |
For basic methods like
.map
and.filter
, onAsyncIterator
, using theSimpleTransformIterator
introduces uneccessary overhead via thereadAndTransformSimple
operator. To improve performance in applications like Comunica it may be worth creating custom iterators with simpler read methods; for instance a synchronousMapIterator
could skip buffering entirely and just be implemented as something like@jacoscaz I suspect this is the cause of the potential slowdown you were encountering when doing a large amount of chaining.
The text was updated successfully, but these errors were encountered: