Add MappingIterator for synchronous transformations #59

jeswr · 2022-04-05T03:19:02Z

An alternative to #58 which uses callbacks to close the iterator rather than creating a new class.

coveralls · 2022-04-05T03:20:20Z

Coverage remained the same at 100.0% when pulling 3be4178 on feat/faster-transforming-no-limit-class into a536bca on main.

jeswr · 2022-04-05T03:31:52Z

@RubenVerborgh When I run it locally node 10 coveralls check is complaining about a comment in line 38 - so not sure how to resolve that one...

RubenVerborgh · 2022-04-05T20:16:17Z

Rather than close, how about we call the passed map function with (item, iterator) as arguments?
Then limit can close the iterator itself, and we don't need an exception for it.

asynciterator.ts

jeswr · 2022-04-06T08:28:36Z

This is ready for review @RubenVerborgh - given that this is now much cleaner than #58 I suggest that we merge the branch more-or-less in this state and worry about possible performance tweaks in the order of 10-20% at a later date.

RubenVerborgh · 2022-04-06T09:15:28Z

Excellent. Thanks, @jeswr; I can work with this!

RubenVerborgh · 2022-06-24T18:02:55Z

@jeswr Thanks for your patience. Gave this a deep review.

This is almost good to go, except the readable getter, which uses a trick that will get us into trouble.
The readable mechanism is quite intricate, and is also used to fire the readable event at the correct times.
So we really need something that sets this.readable = true when the source becomes readable, similar to how TransformIterator has source.on('readable', destinationFillBuffer).

As a fun challenge (and looping in @joachimvh), I gave up on the typing of private readonly _mappings: MapFunction<any, any, MappingIterator<S, D>>[]; which should really be private readonly _mappings: MapChain<S, D> where MapChain<S, D> is recursively defined as [MapFunction<S, D>] | [MapFunction<S, X>, ...MapChain<X, D>] with some infer X magic. However, I couldn't do it. Not a necessity though 🙂

RubenVerborgh · 2022-06-24T18:03:53Z

asynciterator.ts

+  private readonly _mappingRoot: InternalSource<any>;
+
+  // This is wrong: readable should be set by listening to source events
+  get readable() {


This breaks things, because certain readable events will not be fired. We need to actually set readable = true.

asynciterator.ts

RubenVerborgh · 2022-06-24T21:04:32Z

asynciterator.ts

@@ -25,6 +25,11 @@ export function setTaskScheduler(scheduler: TaskScheduler): void {
  taskScheduler = scheduler;
 }

+/** Binds a function to an object */
+function bind(fn: Function, self?: object) {


Suggested change

function bind(fn: Function, self?: object) {

function bind<T extends Function>(fn: T, self?: object): T {

is the correct one, but that leads to errors below.

RubenVerborgh · 2022-06-24T21:06:21Z

asynciterator.ts

+  }
+
+  map<K>(map: (item: D, it: AsyncIterator<any>) => K | null, self?: any): AsyncIterator<K> {
+    return new MappingIterator<S, K>(this._source, [...this._mappings, bind(map, self)], this);


Something is off here.

MappingIterator (overload 2) takes as constructor parameters the direct source (should be this), the mappings, and the mapping root (should be this._mappingRoot). Yet the tests do not catch this problem.

RubenVerborgh · 2022-06-24T21:11:08Z

asynciterator.ts

+    let mapped : any = null;
+    while (mapped === null && (mapped = this._source.read()) !== null) {
+      for (let i = 0; i < this._mappings.length; i++) {
+        mapped = this._mappings[i](mapped, this);


The this pointer is not correct here; that should be the MappingIterator at each level.

I think you are right but I am still context switching back to this issue so not fully confident yet.

RubenVerborgh · 2022-06-24T21:56:11Z

Sorry for the haphazard comments, @jeswr! They may look random, but it's actually three separate waves of working on this, and some crucial thinking work in between.

However, this is the conclusion I've come to:

Let's first merge this without the repeated .map optimization.
- That will allow us to get readable right.
- There are a couple of conceptual difficulties with the combined mapping from the root. One of them is the exposure of the iterator argument in the map callback. When you compress multiple mappings into one, this argument becomes meaningless/powerless: calling .end on these intermediate iterators becomes hard to define.
We should not expose the iterator argument in the map function, as this excludes certain optimizations in the future.
- We could define a special return value (a Symbol or so) that means "end the iterator", just like null now means "don't emit this item". I don't think that's a great solution though in general, to have such special return values. Or we could throw a special kind of error. But still not my favorite.
The previous point probably also means we need a different take iterator; I propose we just inline the solution of feat: faster transforming #58 (but not have a separate HeadIterator, that seems overkill).
In this simpler implementation, let's focus on getting all edge cases right.
- For instance, the source should not be read from if the source if closed, or if the mapped iterator is closed.
- And we'll need tests for all these (similar to TransformIterator).

I know some of the above go against my earlier suggestions, but reviewing the resulting code in depth brought new insights.

RubenVerborgh · 2022-06-24T22:31:56Z

asynciterator.ts

+
+  read(): D | null {
+    let mapped : any = null;
+    while (mapped === null && (mapped = this._source.read()) !== null) {


Let's not read the source if we are closed. Need to guard for this. And test.

jeswr · 2022-06-25T07:35:56Z

I haven't fully context switched back to this issue so apologies if any of these q's/notes are obvious or obviously incorrect.

For reference here is the original state of the PR when it was opened which I will also refer to a few times below.

Firstly I just wanted to make a clarification that I suspect answers this comment. The source is what you refer to as root in this description. The mappingRoot (previously called upstream) would be the iterator constituting root.map(f1).map(f2) if we are considering the iterator for root.map(f1).map(f2).map(f3) as our 'current iterator'.

The original PR also did not contain constructor overloads so I'll need a little more time to understand what has been done there before I can comment on whether there is indeed a bug now.

This is almost good to go, except the readable getter, which uses a trick that will get us into trouble.

I'm still not convinced that this was a problem when I started this PR. The logic flow I was following is that when .readable is set to true via a getter, and in turn set true on the source (i.e. root) as is done in my original version of the PR, the consequence of this is that the root becomes readable and emits a readable event. Then as a result of this line the iterator for root.map(f1) would emit readable, and in turn root.map(f1).map(f2) would emit readable and finally root.map(f1).map(f2).map(f3) would emit readable.

This ensures that readable events are fired when required, whilst also ensuring that an iterator is readable if and only if (at least while neither iterator is closed) the source is readable which is to be expected when doing synchronous transformations.

Since the setter has since been removed from the original PR there will indeed be a bug around readable events.

As for the issue around ending, this was the reason for checking whether the state is CLOSED or not for readability. This means that if the root has ended then root.map(f1), root.map(f1).map(f2) and root.map(f1).map(f2).map(f3) can never be readable again - and also this line ensures that each of the downstream iterators have the end event called on them. Furthermore, similarly to the readable case, if the iterator corresponding to root.map(f1).map(f2) were ended then only it and root.map(f1).map(f2).map(f3) would be ended.

We should not expose the iterator argument in the map function, as this excludes certain optimizations in the future

What kind of optimisations does this exclude? Importantly, can they be achieved if we just pass a close callback rather than the full iterator (this was the set-up when the PR was originally opened). A close callback would also avoid the need for a custom take iterator.

RubenVerborgh · 2022-06-25T17:19:11Z

I haven't fully context switched back to this issue so apologies if any of these q's/notes are obvious or obviously incorrect.

Yeah, sorry that I had to leave this one for a while. I think it's the last thing I worked on before I caught covid, so I've never been able to complete my original review. Back on things now though!

The original PR also did not contain constructor overloads

The only thing this does is allowing people to pass in a single mapping function, so it not needing to be a one-element array. The array constructor is considered private, given that it is an internal optimization; so I essentially split the two cases (external versus internal construction).

I'm still not convinced that this was a problem when I started this PR. The logic flow I was following is that when .readable is set to true via a getter, and in turn set true on the source (i.e. root) as is done in my original version of the PR, the consequence of this is that the root becomes readable and emits a readable event.

Right.
I think what happened is that I started adjusting the unit tests a bit, and noticed weird failures. So the unit tests now are correct, but I believe they don't work with the original PR because of readable events (some iterators were hanging).

My memory is that I fixed the readable to the point the tests were running, but I knew there would be other problems, hence the comment I wrote. For instance, given my (this._state < CLOSED) && edit, I presume that readable events were being fired at times where this was not possible. So even though my fix worked, I could think of other scenarios that would go wrong.

The main challenge with AsyncIterator is correctness; it's sometimes quite hard to reason about edge cases. All tests might work, and suddenly 1 out of a 100 times a 1 million stream will stall and it will be impossible to figure out why. So that's why I really suggest sticking to the existing readable mechanism, that has all the state checks etc., and has all the corresponding order-based unit tests attached to it. Or, if overwriting the mechanism, then a whole bunch of new readable tests will be necessary (and they will surface a couple of bugs).

Since the setter has since been removed from the original PR there will indeed be a bug around readable events.

Yes, but having the setter might give rise to a whole other range of problems. The readable setter was actually supposed to be private, but TypeScript doesn't allow such an asymmetry. Messing with the state of other iterators can be tricky.

I think if you put the setter back, you will see test failures.

What kind of optimisations does this exclude? Importantly, can they be achieved if we just pass a close callback rather than the full iterator (this was the set-up when the PR was originally opened).

Whether we pass an iterator or a close method tied to an iterator, it means that there is always the concept of an iterator at that stage. I.e., the map function then has knowledge about the execution. If we take a shortcut, where actually one iterator performs the work of 3 others in a single go, then things get confusing. So I don't think we should pass anything.

A close callback would also avoid the need for a custom take iterator.

Indeed, but at a cost. Perhaps it does make sense to have an UntilIterator or so, which takes a function as an argument that, when true, stops iteration. Then we have generalized stopping.

The main question is how we want to approach this PR now, I realize it's very inconvenient that, due to circumstances, there were 2 months between your and my initial work on this, and today. It's not easy to get the context back from then.

Do you still have the bandwidth to work on this? Just because you were volunteering back then, does not mean you still have the time today, and I want to be respectful of your contributions.

A way forward that I see at the moment is, given the current unit tests (which are a great harness to have), to re-envision the implementation a bit. I.e., if this were myself working on it, I would start with the existing code from the main branch, and make it work with the unit tests from this branch by incrementally copying in the code from this branch. I'd get a stable version without the sequential map optimization, merge it, and then add the optimization as needed.

Let me know what you think and what's feasible for you. In any case, thanks a lot for working on this.

jeswr · 2022-06-25T22:59:00Z

Do you still have the bandwidth to work on this?

Yep - I can have a go today branching off main, use the current test harness, and then follow the incremental approach suggested. The responses above appear to provide some very useful explanations that I can use to help inform that design. If after that there are still some conceptual problems then it might be faster for you to take over rather than having to review several more iterations of my code - but will see how I go today before making that call.

thanks a lot for working on this.

No worries! I always enjoy working on OS code and I've learned a lot from the work and reviews. Thanks for all your work on the reviews!

RubenVerborgh · 2022-06-26T20:25:20Z

The responses above appear to provide some very useful explanations that I can use to help inform that design. If after that there are still some conceptual problems then it might be faster for you to take over rather than having to review several more iterations of my code - but will see how I go today before making that call.

You're absolutely nailing it in #75, so let's finish things there 👍👍

jeswr requested a review from RubenVerborgh April 5, 2022 08:30

RubenVerborgh reviewed Apr 5, 2022

View reviewed changes

asynciterator.ts Outdated Show resolved Hide resolved

RubenVerborgh reviewed Apr 5, 2022

View reviewed changes

asynciterator.ts Outdated Show resolved Hide resolved

jeswr requested a review from RubenVerborgh April 5, 2022 23:34

This was referenced Apr 6, 2022

Performance: Improved performance for async iterables nodejs/node#42618

Closed

feat: faster transforming #58

Closed

RubenVerborgh self-assigned this Apr 6, 2022

This was referenced Apr 7, 2022

Faster filtering, mapping, limiting, skipping and wrapping #57

Closed

feat: faster wrapping #63

Merged

RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch from b5f2dd0 to 6b4f039 Compare April 7, 2022 20:32

jeswr mentioned this pull request Apr 8, 2022

feat: add uniq method to AsyncIterator #64

Merged

RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch from 6b4f039 to 34e1a60 Compare April 8, 2022 15:57

RubenVerborgh changed the title ~~Feat/faster transforming no limit class~~ Add MappingIterator for synchronous transformations Apr 8, 2022

RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch 2 times, most recently from 7beffc0 to 4652546 Compare April 8, 2022 16:01

This was referenced Apr 10, 2022

WIP: V3 Sync Unions #65

Closed

Chore: remove unecessary transform calls comunica/comunica#977

Open

jacoscaz mentioned this pull request Apr 20, 2022

optimization: Reduce overhead of SimpleTransformIterator #44

Open

jeswr mentioned this pull request Jun 13, 2022

feat: maybeIterator #68

Draft

RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch from 4652546 to 38ae72f Compare June 24, 2022 16:18

Introduce MappingIterator.

ba64a35

RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch 2 times, most recently from 095b446 to fd69434 Compare June 24, 2022 17:58

RubenVerborgh removed their assignment Jun 24, 2022

RubenVerborgh assigned jeswr Jun 24, 2022

RubenVerborgh requested changes Jun 24, 2022

View reviewed changes

RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch from fd69434 to e8b1a7c Compare June 24, 2022 18:50

RubenVerborgh reviewed Jun 24, 2022

View reviewed changes

WIP: Review.

3be4178

RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch from e8b1a7c to 3be4178 Compare June 24, 2022 21:09

RubenVerborgh reviewed Jun 24, 2022

View reviewed changes

jeswr mentioned this pull request Jun 26, 2022

Draft: Feat/mapping iterator #75

Merged

RubenVerborgh closed this Jun 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MappingIterator for synchronous transformations #59

Add MappingIterator for synchronous transformations #59

jeswr commented Apr 5, 2022

coveralls commented Apr 5, 2022 •

edited

Loading

jeswr commented Apr 5, 2022

RubenVerborgh commented Apr 5, 2022

jeswr commented Apr 6, 2022

RubenVerborgh commented Apr 6, 2022

RubenVerborgh commented Jun 24, 2022 •

edited

Loading

RubenVerborgh Jun 24, 2022

RubenVerborgh Jun 24, 2022

RubenVerborgh Jun 24, 2022

RubenVerborgh Jun 24, 2022

jeswr Jun 25, 2022

RubenVerborgh commented Jun 24, 2022 •

edited

Loading

RubenVerborgh Jun 24, 2022 •

edited

Loading

jeswr commented Jun 25, 2022 •

edited

Loading

RubenVerborgh commented Jun 25, 2022 •

edited

Loading

jeswr commented Jun 25, 2022

RubenVerborgh commented Jun 26, 2022

	function bind(fn: Function, self?: object) {
	function bind<T extends Function>(fn: T, self?: object): T {

Add MappingIterator for synchronous transformations #59

Add MappingIterator for synchronous transformations #59

Conversation

jeswr commented Apr 5, 2022

coveralls commented Apr 5, 2022 • edited Loading

jeswr commented Apr 5, 2022

RubenVerborgh commented Apr 5, 2022

jeswr commented Apr 6, 2022

RubenVerborgh commented Apr 6, 2022

RubenVerborgh commented Jun 24, 2022 • edited Loading

RubenVerborgh Jun 24, 2022

Choose a reason for hiding this comment

RubenVerborgh Jun 24, 2022

Choose a reason for hiding this comment

RubenVerborgh Jun 24, 2022

Choose a reason for hiding this comment

RubenVerborgh Jun 24, 2022

Choose a reason for hiding this comment

jeswr Jun 25, 2022

Choose a reason for hiding this comment

RubenVerborgh commented Jun 24, 2022 • edited Loading

RubenVerborgh Jun 24, 2022 • edited Loading

Choose a reason for hiding this comment

jeswr commented Jun 25, 2022 • edited Loading

RubenVerborgh commented Jun 25, 2022 • edited Loading

jeswr commented Jun 25, 2022

RubenVerborgh commented Jun 26, 2022

coveralls commented Apr 5, 2022 •

edited

Loading

RubenVerborgh commented Jun 24, 2022 •

edited

Loading

RubenVerborgh commented Jun 24, 2022 •

edited

Loading

RubenVerborgh Jun 24, 2022 •

edited

Loading

jeswr commented Jun 25, 2022 •

edited

Loading

RubenVerborgh commented Jun 25, 2022 •

edited

Loading