-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance: Improved performance for async iterables #42618
Comments
Are the suggested changes passing all of node's tests? |
It wouldn't because this library is missing a few elements that the |
If it passes all tests, then feel free to submit a PR. |
@nodejs/streams |
Hey, thanks for asking. I don't think we spent a lot of time optimizing this area in the past and were mostly waiting for people to ask. Is the ask specifically to make async iterators for |
I don't think we can just replace the current implementation. Would break too much stuff. |
The current implementation doesn't refer to the whole of I think the ask is to improve our performance for the async iterator API for Readable? |
I'm not sure what the ask here is. 😕 |
I think this needs to be a bit more specific... I'm not sure what optimisations we should apply. |
For diff --git a/lib/internal/streams/from.js b/lib/internal/streams/from.js
index d3d43f7dfb..ad2afa578c 100644
--- a/lib/internal/streams/from.js
+++ b/lib/internal/streams/from.js
@@ -25,10 +25,28 @@ function from(Readable, iterable, opts) {
});
}
+ if (Array.isArray(iterable)) {
+ let i = 0;
+ return new Readable({
+ objectMode: true,
+ ...opts,
+ read() {
+ if (i < iterable.length) {
+ this.push(iterable[i++]);
+ } else {
+ iterable.push(null);
+ }
+ }
+ });
+ }
+
+
let isAsync;
if (iterable && iterable[SymbolAsyncIterator]) {
isAsync = true;
iterator = iterable[SymbolAsyncIterator]();
} else if (iterable && iterable[SymbolIterator]) {
isAsync = false;
iterator = iterable[SymbolIterator](); I'm not sure it's worth it though? Statically creating readables with |
We could probably make that even faster by just injecting the array as the readable's buffer. But again this seems like a quite unusual use case. |
I don't really understand what's the use case for this. I'm totally ok to have the perf improvement in, but what are we optimizing for in this case? I would not recommend anyone to process an Array using an AsyncIterator, it'll definitely be slower than just processing the Array. |
Apologies for the lack of clarity in my original issue. The main point was to highlight that there is room to make pull-based/Asynchronous streams a lot faster which has the potential to speed up downstream applications; I see where the choice to benchmark using Readable.from may have caused confusion. @jacoscaz reported to me that he has applications (not using Readable.from or its equivalent) that are now 600-900x as a result porting from using Readable to using the current release of the Asynciterator package. I also note that I'm not intimately familiar with he nodejs codebase - and all comments are based on the observed performance comparison between Readable, and the Asynciterator package. The ask is more specifically to optimize (in order of importance) the speed of: Once/if these have been applied then there are potentially some further optimisations that can be looked into based on the thread I linked to in my original message. |
@jeswr I'm not sure I follow. |
Note I'll try and revisit this once I've had a chance to dig into this codebase further - I don't have as much time as I would like to dig into this right now. Off the top of my head there is a very naive way to get improved performance without breaking backwards compat which is to have a constructor parameter However, I think there should also be ways of achieving this without downstream libs having to add this extra parameter. The way this kind of thing achieved in the In addition, let me alay some confusion by giving an example without let i = 0;
let SIZE = 100_000;
let iterator = new Readable({
read() {
if (i < SIZE) {
this.push(i++);
this.push(i++);
this.push(i++);
} else {
this.push(null)
}
},
objectMode: true
});
for (let i = 0; i < 10; i++) {
iterator = iterator.map(x => x + 1)
}
const now = performance.now();
iterator.on('data', () => {}).on('end', () => {
console.log(performance.now() - now);
}) import { BufferedIterator } from './dist/asynciterator.js'
let i = 0;
let SIZE = 100_000;
class MyIterator extends BufferedIterator {
_read(count, done) {
if (i < SIZE) {
this._push(i++);
this._push(i++);
this._push(i++);
done();
} else {
this.close();
done();
}
}
}
let iterator = new MyIterator();
for (let i = 0; i < 10; i++) {
iterator = iterator.map(x => x + 1)
}
const now = performance.now();
iterator.on('data', () => {}).on('end', () => {
console.log(performance.now() - now);
}) |
I think the problem here is that If you remove the |
I'm not sure why I'd use an async iterator with a synchronous map but we can definitely make that much much faster and (for example) fall back on a regular for loop for a asynchronous map on the stream's buffer and even do some of the tricks @jeswr is trying to apply (the transducing several |
One use case is this configurable query engine which takes data from pull-based sources and then applies various transforms to the data in the query evaluation process. Here is a link to a subset of the places it applies such transforms RubenVerborgh/AsyncIterator#44 (comment) |
Elaborating on @jeswr's comments with my own thoughts, I think the underlying issue here is that developers are often tempted to use streams (as in the API implemented by the In one use case of which I am unfortunately unable to share the code, we achieved a 150x perf. increase simply by moving from a pre-existing
And I am unsure as to whether the stream API should do so, either; at least not as a primary goal. After all, it was born to deal with a different problem. However, especially for newcomers to either Node.js and/or stream-based processing, the convenience of simplified construction, of If I had to formulate an ask, I guess it might be two-fold:
|
I should note that we did not consider |
Definitely! Any contribution is welcomed |
There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment. For more information on how the project manages feature requests, please consult the feature request management document. |
There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment. For more information on how the project manages feature requests, please consult the feature request management document. |
What is the problem this feature will solve?
I have recently been working on optimizations for this implementation of Asynchronous iterators. I did a quick benchmark between that implementation; and the Readable API for Node v17.4.0 and found that some of the latest of our optimizations (yet to be released) we have are orders of magnitude faster than the Readable API. As an example, we can apply 10 maps to 200_000 elements in 20ms, whilst it takes the Readable API ~2s (tested on DELL XPS 15, 32 GB RAM).
The recent optimizations we have done include:
Array.shift
anO(1)
operation. #42449)This (somewhat long) issue is where we have had most of the discussion around the recent improvements we have been working on.
What is the feature you are proposing to solve the problem?
Use some of the optimizations already implemented in https://github.com/RubenVerborgh/AsyncIterator, RubenVerborgh/AsyncIterator#59
What alternatives have you considered?
No response
The text was updated successfully, but these errors were encountered: