Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MappingIterator for synchronous transformations #59

Closed
wants to merge 2 commits into from

Conversation

jeswr
Copy link
Collaborator

@jeswr jeswr commented Apr 5, 2022

An alternative to #58 which uses callbacks to close the iterator rather than creating a new class.

@coveralls
Copy link

coveralls commented Apr 5, 2022

Coverage Status

Coverage remained the same at 100.0% when pulling 3be4178 on feat/faster-transforming-no-limit-class into a536bca on main.

@jeswr
Copy link
Collaborator Author

jeswr commented Apr 5, 2022

@RubenVerborgh When I run it locally node 10 coveralls check is complaining about a comment in line 38 - so not sure how to resolve that one...

@jeswr jeswr requested a review from RubenVerborgh April 5, 2022 08:30
@RubenVerborgh
Copy link
Owner

Rather than close, how about we call the passed map function with (item, iterator) as arguments?
Then limit can close the iterator itself, and we don't need an exception for it.

asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Outdated Show resolved Hide resolved
@jeswr
Copy link
Collaborator Author

jeswr commented Apr 6, 2022

This is ready for review @RubenVerborgh - given that this is now much cleaner than #58 I suggest that we merge the branch more-or-less in this state and worry about possible performance tweaks in the order of 10-20% at a later date.

@RubenVerborgh
Copy link
Owner

Excellent. Thanks, @jeswr; I can work with this!

@RubenVerborgh RubenVerborgh self-assigned this Apr 6, 2022
@RubenVerborgh RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch from b5f2dd0 to 6b4f039 Compare April 7, 2022 20:32
@RubenVerborgh RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch from 6b4f039 to 34e1a60 Compare April 8, 2022 15:57
@RubenVerborgh RubenVerborgh changed the title Feat/faster transforming no limit class Add MappingIterator for synchronous transformations Apr 8, 2022
@RubenVerborgh RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch 2 times, most recently from 7beffc0 to 4652546 Compare April 8, 2022 16:01
@jeswr jeswr mentioned this pull request Jun 13, 2022
@RubenVerborgh RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch from 4652546 to 38ae72f Compare June 24, 2022 16:18
@RubenVerborgh RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch 2 times, most recently from 095b446 to fd69434 Compare June 24, 2022 17:58
@RubenVerborgh
Copy link
Owner

RubenVerborgh commented Jun 24, 2022

@jeswr Thanks for your patience. Gave this a deep review.

This is almost good to go, except the readable getter, which uses a trick that will get us into trouble.
The readable mechanism is quite intricate, and is also used to fire the readable event at the correct times.
So we really need something that sets this.readable = true when the source becomes readable, similar to how TransformIterator has source.on('readable', destinationFillBuffer).

As a fun challenge (and looping in @joachimvh), I gave up on the typing of private readonly _mappings: MapFunction<any, any, MappingIterator<S, D>>[]; which should really be private readonly _mappings: MapChain<S, D> where MapChain<S, D> is recursively defined as [MapFunction<S, D>] | [MapFunction<S, X>, ...MapChain<X, D>] with some infer X magic. However, I couldn't do it. Not a necessity though 🙂

@RubenVerborgh RubenVerborgh removed their assignment Jun 24, 2022
private readonly _mappingRoot: InternalSource<any>;

// This is wrong: readable should be set by listening to source events
get readable() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks things, because certain readable events will not be fired. We need to actually set readable = true.

asynciterator.ts Show resolved Hide resolved
@RubenVerborgh RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch from fd69434 to e8b1a7c Compare June 24, 2022 18:50
@@ -25,6 +25,11 @@ export function setTaskScheduler(scheduler: TaskScheduler): void {
taskScheduler = scheduler;
}

/** Binds a function to an object */
function bind(fn: Function, self?: object) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function bind(fn: Function, self?: object) {
function bind<T extends Function>(fn: T, self?: object): T {

is the correct one, but that leads to errors below.

}

map<K>(map: (item: D, it: AsyncIterator<any>) => K | null, self?: any): AsyncIterator<K> {
return new MappingIterator<S, K>(this._source, [...this._mappings, bind(map, self)], this);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is off here.

MappingIterator (overload 2) takes as constructor parameters the direct source (should be this), the mappings, and the mapping root (should be this._mappingRoot). Yet the tests do not catch this problem.

@RubenVerborgh RubenVerborgh force-pushed the feat/faster-transforming-no-limit-class branch from e8b1a7c to 3be4178 Compare June 24, 2022 21:09
let mapped : any = null;
while (mapped === null && (mapped = this._source.read()) !== null) {
for (let i = 0; i < this._mappings.length; i++) {
mapped = this._mappings[i](mapped, this);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The this pointer is not correct here; that should be the MappingIterator at each level.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right but I am still context switching back to this issue so not fully confident yet.

@RubenVerborgh
Copy link
Owner

RubenVerborgh commented Jun 24, 2022

Sorry for the haphazard comments, @jeswr! They may look random, but it's actually three separate waves of working on this, and some crucial thinking work in between.

However, this is the conclusion I've come to:

  • Let's first merge this without the repeated .map optimization.
    • That will allow us to get readable right.
    • There are a couple of conceptual difficulties with the combined mapping from the root. One of them is the exposure of the iterator argument in the map callback. When you compress multiple mappings into one, this argument becomes meaningless/powerless: calling .end on these intermediate iterators becomes hard to define.
  • We should not expose the iterator argument in the map function, as this excludes certain optimizations in the future.
    • We could define a special return value (a Symbol or so) that means "end the iterator", just like null now means "don't emit this item". I don't think that's a great solution though in general, to have such special return values. Or we could throw a special kind of error. But still not my favorite.
  • The previous point probably also means we need a different take iterator; I propose we just inline the solution of feat: faster transforming #58 (but not have a separate HeadIterator, that seems overkill).
  • In this simpler implementation, let's focus on getting all edge cases right.
    • For instance, the source should not be read from if the source if closed, or if the mapped iterator is closed.
    • And we'll need tests for all these (similar to TransformIterator).

I know some of the above go against my earlier suggestions, but reviewing the resulting code in depth brought new insights.


read(): D | null {
let mapped : any = null;
while (mapped === null && (mapped = this._source.read()) !== null) {
Copy link
Owner

@RubenVerborgh RubenVerborgh Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not read the source if we are closed. Need to guard for this. And test.

@jeswr
Copy link
Collaborator Author

jeswr commented Jun 25, 2022

I haven't fully context switched back to this issue so apologies if any of these q's/notes are obvious or obviously incorrect.

For reference here is the original state of the PR when it was opened which I will also refer to a few times below.


Firstly I just wanted to make a clarification that I suspect answers this comment. The source is what you refer to as root in this description. The mappingRoot (previously called upstream) would be the iterator constituting root.map(f1).map(f2) if we are considering the iterator for root.map(f1).map(f2).map(f3) as our 'current iterator'.

The original PR also did not contain constructor overloads so I'll need a little more time to understand what has been done there before I can comment on whether there is indeed a bug now.

This is almost good to go, except the readable getter, which uses a trick that will get us into trouble.

I'm still not convinced that this was a problem when I started this PR. The logic flow I was following is that when .readable is set to true via a getter, and in turn set true on the source (i.e. root) as is done in my original version of the PR, the consequence of this is that the root becomes readable and emits a readable event. Then as a result of this line the iterator for root.map(f1) would emit readable, and in turn root.map(f1).map(f2) would emit readable and finally root.map(f1).map(f2).map(f3) would emit readable.

This ensures that readable events are fired when required, whilst also ensuring that an iterator is readable if and only if (at least while neither iterator is closed) the source is readable which is to be expected when doing synchronous transformations.

Since the setter has since been removed from the original PR there will indeed be a bug around readable events.

As for the issue around ending, this was the reason for checking whether the state is CLOSED or not for readability. This means that if the root has ended then root.map(f1), root.map(f1).map(f2) and root.map(f1).map(f2).map(f3) can never be readable again - and also this line ensures that each of the downstream iterators have the end event called on them. Furthermore, similarly to the readable case, if the iterator corresponding to root.map(f1).map(f2) were ended then only it and root.map(f1).map(f2).map(f3) would be ended.

We should not expose the iterator argument in the map function, as this excludes certain optimizations in the future

What kind of optimisations does this exclude? Importantly, can they be achieved if we just pass a close callback rather than the full iterator (this was the set-up when the PR was originally opened). A close callback would also avoid the need for a custom take iterator.

@RubenVerborgh
Copy link
Owner

RubenVerborgh commented Jun 25, 2022

I haven't fully context switched back to this issue so apologies if any of these q's/notes are obvious or obviously incorrect.

Yeah, sorry that I had to leave this one for a while. I think it's the last thing I worked on before I caught covid, so I've never been able to complete my original review. Back on things now though!

The original PR also did not contain constructor overloads

The only thing this does is allowing people to pass in a single mapping function, so it not needing to be a one-element array. The array constructor is considered private, given that it is an internal optimization; so I essentially split the two cases (external versus internal construction).

I'm still not convinced that this was a problem when I started this PR. The logic flow I was following is that when .readable is set to true via a getter, and in turn set true on the source (i.e. root) as is done in my original version of the PR, the consequence of this is that the root becomes readable and emits a readable event.

Right.
I think what happened is that I started adjusting the unit tests a bit, and noticed weird failures. So the unit tests now are correct, but I believe they don't work with the original PR because of readable events (some iterators were hanging).

My memory is that I fixed the readable to the point the tests were running, but I knew there would be other problems, hence the comment I wrote. For instance, given my (this._state < CLOSED) && edit, I presume that readable events were being fired at times where this was not possible. So even though my fix worked, I could think of other scenarios that would go wrong.

The main challenge with AsyncIterator is correctness; it's sometimes quite hard to reason about edge cases. All tests might work, and suddenly 1 out of a 100 times a 1 million stream will stall and it will be impossible to figure out why. So that's why I really suggest sticking to the existing readable mechanism, that has all the state checks etc., and has all the corresponding order-based unit tests attached to it. Or, if overwriting the mechanism, then a whole bunch of new readable tests will be necessary (and they will surface a couple of bugs).

Since the setter has since been removed from the original PR there will indeed be a bug around readable events.

Yes, but having the setter might give rise to a whole other range of problems. The readable setter was actually supposed to be private, but TypeScript doesn't allow such an asymmetry. Messing with the state of other iterators can be tricky.

I think if you put the setter back, you will see test failures.

What kind of optimisations does this exclude? Importantly, can they be achieved if we just pass a close callback rather than the full iterator (this was the set-up when the PR was originally opened).

Whether we pass an iterator or a close method tied to an iterator, it means that there is always the concept of an iterator at that stage. I.e., the map function then has knowledge about the execution. If we take a shortcut, where actually one iterator performs the work of 3 others in a single go, then things get confusing. So I don't think we should pass anything.

A close callback would also avoid the need for a custom take iterator.

Indeed, but at a cost. Perhaps it does make sense to have an UntilIterator or so, which takes a function as an argument that, when true, stops iteration. Then we have generalized stopping.


The main question is how we want to approach this PR now, I realize it's very inconvenient that, due to circumstances, there were 2 months between your and my initial work on this, and today. It's not easy to get the context back from then.

Do you still have the bandwidth to work on this? Just because you were volunteering back then, does not mean you still have the time today, and I want to be respectful of your contributions.

A way forward that I see at the moment is, given the current unit tests (which are a great harness to have), to re-envision the implementation a bit. I.e., if this were myself working on it, I would start with the existing code from the main branch, and make it work with the unit tests from this branch by incrementally copying in the code from this branch. I'd get a stable version without the sequential map optimization, merge it, and then add the optimization as needed.

Let me know what you think and what's feasible for you. In any case, thanks a lot for working on this.

@jeswr
Copy link
Collaborator Author

jeswr commented Jun 25, 2022

Do you still have the bandwidth to work on this?

Yep - I can have a go today branching off main, use the current test harness, and then follow the incremental approach suggested. The responses above appear to provide some very useful explanations that I can use to help inform that design. If after that there are still some conceptual problems then it might be faster for you to take over rather than having to review several more iterations of my code - but will see how I go today before making that call.

thanks a lot for working on this.

No worries! I always enjoy working on OS code and I've learned a lot from the work and reviews. Thanks for all your work on the reviews!

@RubenVerborgh
Copy link
Owner

The responses above appear to provide some very useful explanations that I can use to help inform that design. If after that there are still some conceptual problems then it might be faster for you to take over rather than having to review several more iterations of my code - but will see how I go today before making that call.

You're absolutely nailing it in #75, so let's finish things there 👍👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants