Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Feat/mapping iterator #75

Merged
merged 1 commit into from
Jul 4, 2022
Merged

Draft: Feat/mapping iterator #75

merged 1 commit into from
Jul 4, 2022

Conversation

jeswr
Copy link
Collaborator

@jeswr jeswr commented Jun 26, 2022

A fresh branch, that does not use transduction, as discussed in #59.

I think this resolves all readability problems @RubenVerborgh though I'm hesitant to say that for certain until I look with fresh eyes in the morning.

Getting close (I think) to correct readable behaviour - I just need to get one more set of new tests that I just created working. The currently failing tests test the case where the upstream iterator becomes non-readable without closing.

I'll come back to this in a few hrs later this evening :)

Repository owner deleted a comment from coveralls Jun 26, 2022
@jeswr
Copy link
Collaborator Author

jeswr commented Jun 26, 2022

Just added a quick performance check we have

Main:
20000000 elems 20 maps 49566.92061899975
20000000 elems 20 maps 20 filters 24873.365212999284
100_000 iterators each with 5 maps and 100 elements 8609.967392999679

This PR
20000000 elems 20 maps 3584.7544269990176
20000000 elems 20 maps 20 filters 3637.598729001358
100_000 iterators each with 5 maps and 100 elements 1951.0139819998294

@jeswr jeswr requested a review from RubenVerborgh June 26, 2022 11:54
@jeswr jeswr marked this pull request as ready for review June 26, 2022 11:57
Copy link
Owner

@RubenVerborgh RubenVerborgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good stuff, @jeswr. We're absolutely on the right track.

In general, we need to be conservative/defensive when programming derived classes, and not assume too many behaviors of them. I should probably have a list of allowed assumptions, but it's not black and white. Briefly, let's not assume that source is not one of the implementations we've written, but rather a more unstable one. We have to assume its state and its readable are correct, but that's about it.

asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Show resolved Hide resolved
asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Outdated
Comment on lines 844 to 847
if (this._source.done) {
this.close();
return null;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (this._source.done) {
this.close();
return null;
}
if (!this._source.readable) {
this.readable = false;
if (this._source.done)
this.close();
return null;
}

^ We should be maximally well-behaved and not attempt to call .read on the source unless it advertises itself as potentially readable.

Copy link
Collaborator Author

@jeswr jeswr Jun 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RubenVerborgh Is it worth opening an issue to apply this idea to other iterators, for instance in _readAndTransform and _readAndTransformSimple.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that TransformIterator and SimpleTransformIterator also accept regular Node.js Readable instances, which do not define readable. So the goal there is to also work with non-AsyncIterators (see wrap).

We can have the same goal for MappingIterator, but that requires different tests.

asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Show resolved Hide resolved
perf/MappingIterator-perf.js Outdated Show resolved Hide resolved
perf/MappingIterator-perf.js Outdated Show resolved Hide resolved
@RubenVerborgh
Copy link
Owner

Thanks, @jeswr, looking really good. Amazing what two fresh pairs of eyes can do, innit? 😉
I will make some small stylistic changes the code here and there, and then assign back to you for final approval.
Then we merge and release a new minor (unless you want to have take / the UntilIterator in as well).

@jeswr
Copy link
Collaborator Author

jeswr commented Jun 28, 2022

I think that since this is close, clean it up and then we can release. After that is done we could put most of the logic into an abstract class that MapIterator, UntilIterator and, eventually CompositeIterator can share.

I have in mind that it will look something like this (which does have all tests current passing and is over 2x faster on the current performance tests):

/**
 * A synchronous mapping function from one element to another.
 * A return value of `null` means that nothing should be emitted for a particular item.
 */
export type MapFunction<S, D = S> = (item: S) => D | null;

/**
 An iterator that calls a synchronous operation
 on every item from its source iterator.
 @extends module:asynciterator.AsyncIterator
*/
export abstract class SynchronousTransformIterator<S, D = S> extends AsyncIterator<D> {
  private readonly _destroySource: boolean;
  protected _source: InternalSource<S>;

  /**
   * Applies the given mapping to the source iterator.
   */
  constructor(
    _source: AsyncIterator<S>,
    options: SourcedIteratorOptions = {}
  ) {
    super();
    this._source = _validateSource(_source);
    this._destroySource = options.destroySource !== false;

    if (_source.done) {
      this.close();
    }
    else {
      this._source._destination = this;
      this._source.on('end', destinationClose);
      this._source.on('error', destinationEmitError);
      this._source.on('readable', destinationSetReadable);
      // If we are given a source that is already readable
      // then we need to set the state of this iterable to readable
      // also as there is no guarantee that the is no forthcoming
      // readable event from the source
      this.readable = this._source.readable;
    }
  }

  protected abstract safeRead(): D | null;

  read(): D | null {
    // Do not read the source if the current iterator is ended
    if (this.done)
      return null;

    // A source should only be read from if readable is true
    if (!this._source.readable) {
      this.readable = false;
      if (this._source.done)
        this.close();
      return null;
    }

    const item = this.safeRead();

    if (item !== null) {
      return item;
    }

    // This will set readable to false on the current iterator
    // if the source is no longer readable
    this.readable = false;
    return null;
  }

  /* Cleans up the source iterator and ends. */
  protected _end(destroy: boolean) {
    this._source.removeListener('end', destinationClose);
    this._source.removeListener('error', destinationEmitError);
    this._source.removeListener('readable', destinationSetReadable);
    delete this._source._destination;
    if (this._destroySource)
      this._source.destroy();
    super._end(destroy);
  }
}

export class MappingIterator<S, D = S> extends SynchronousTransformIterator<S, D> {
  constructor(
    source: AsyncIterator<S>,
    private readonly _map: MapFunction<S, D>,
    options: SourcedIteratorOptions = {}
  ) {
    super(source, options);
  }

  safeRead() {
    let item: S | null;
    while ((item = this._source.read()) !== null) {
      const mapped = this._map(item);
      if (mapped !== null)
        return mapped;
    }
    return null;
  }

  map<K>(map: MapFunction<D, K>, self?: any): AsyncIterator<K> {
    return new CompositeMappingIterator(this._source, [this._map, bind(map, self)], this);
  }
}

export class CompositeMappingIterator<S, D = S> extends SynchronousTransformIterator<S, D> {
  constructor(
    private root: AsyncIterator<S>,
    private mappings: MapFunction<any, any>[] = [],
    source: AsyncIterator<any>,
    options: SourcedIteratorOptions = {},
  ) {
    super(source, options);
  }

  safeRead() {
    // TODO: See if this is actually necessary
    // A source should only be read from if readable is true
    if (!this.root.readable) {
      this.readable = false;
      // TODO: See if this should be here
      if (this.root.done)
        this.close();
      return null;
    }

    let mapped : any = null;
    while (mapped === null && (mapped = this.root.read()) !== null) {
      for (let i = 0; i < this.mappings.length; i++) {
        mapped = this.mappings[i](mapped);
        if (mapped === null)
          break;
      }
    }
    return mapped;
  }

  map<K>(map: MapFunction<D, K>, self?: any): AsyncIterator<K> {
    return new CompositeMappingIterator(this.root, [...this.mappings, bind(map, self)], this);
  }
}

export class UntilIterator<S> extends SynchronousTransformIterator<S> {
  constructor(
    source: AsyncIterator<S>,
    private limit: number,
    options: SourcedIteratorOptions = {}
  ) {
    super(source, options);
  }

  safeRead() {
    if (this.limit <= 0) {
      this.close();
      return null;
    }
    this.limit -= 1;
    return this._source.read();
  }
}

I'll open up a branch with a cleaned up, and more heavily tested version of the code above (minus the CompositeMappingIterator) once the current branch is merged. The CompositeMappingIterator can then be cleanly addressed later on.

@RubenVerborgh
Copy link
Owner

@jeswr I have done a review and have arrived at the code you can now see in this PR.
Apart from an overall approval, 3 questions:

@jeswr
Copy link
Collaborator Author

jeswr commented Jul 4, 2022

Any concerns about the simplified read logic?

  • I usually call this._source.read() outside of if statements as this reduces the amount of branching required to get to the call which I assume to be the most common case when .read() is called. Would need to do perf. checks to see if this actually has any performance impact though. Looks like this doesn't make a difference in practise though
Original

[jeswr@orangejuly AsyncIterator]$ node perf/MappingIterator-perf.js 
20000000 elems 20 maps                                   1678.1333249993622
20000000 elems 20 maps 20 filter                         1854.2675620019436
100_000 iterators each with 5 maps and 100 elements      2297.956739999354
[jeswr@orangejuly AsyncIterator]$ node perf/MappingIterator-perf.js 
20000000 elems 20 maps                                   1611.3272369988263
20000000 elems 20 maps 20 filter                         1734.021374002099
100_000 iterators each with 5 maps and 100 elements      2286.2911899983883
[jeswr@orangejuly AsyncIterator]$ node perf/MappingIterator-perf.js 
20000000 elems 20 maps                                   1594.8407809995115
20000000 elems 20 maps 20 filter                         1748.050572000444
100_000 iterators each with 5 maps and 100 elements      2127.236763998866

---

Now

[jeswr@orangejuly AsyncIterator]$ node perf/MappingIterator-perf.js 
20000000 elems 20 maps                                   1703.3545530028641
20000000 elems 20 maps 20 filter                         1865.0185250006616
100_000 iterators each with 5 maps and 100 elements      2192.7923329994082
[jeswr@orangejuly AsyncIterator]$ node perf/MappingIterator-perf.js 
20000000 elems 20 maps                                   1602.2818830013275
20000000 elems 20 maps 20 filter                         1791.3512960001826
100_000 iterators each with 5 maps and 100 elements      2325.8517739996314
[jeswr@orangejuly AsyncIterator]$ node perf/MappingIterator-perf.js 
20000000 elems 20 maps                                   1611.5303689986467
20000000 elems 20 maps 20 filter                         1763.5889509990811
100_000 iterators each with 5 maps and 100 elements      2306.6571919992566

@jeswr
Copy link
Collaborator Author

jeswr commented Jul 4, 2022

Any concerns about thereadable logic? eeb4796#diff-8c1065e179bf8c72bbaa6ae735f3be78aac1e609456c0086be7d2b3c1dae66ecL1334-L1337 The original was a bit naughty as superfluous events would be emitted; I'm just checking if there was a reason for that. The unit tests worked fine without it.

Ooh interesting - I thought those were failing before without. I think it was historical when @jacoscaz needed a similar pattern to get unit tests passing.

If tests are working then I think it is safe to get rid of it provided we have good enough coverage with asynchronous sources for the iterator.

@jeswr
Copy link
Collaborator Author

jeswr commented Jul 4, 2022

Any concerns related to the changes and your proposed SynchronousTransformIterator?

Nope - tests still pass making the changes to convert to the SynchronousTransformIterator proposal.

I don't think so - it should just be a matter of moving the .safeRead() call. I'll quickly run some tests to confirm.

Copy link
Collaborator Author

@jeswr jeswr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Approved" @RubenVerborgh with one suggested change (I don't have permissions to click the actual review button :))

asynciterator.ts Outdated
return mapped;
}
}
else if (this._source.done) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else if (this._source.done) {
if (this._source.done) {

Only one suggested change - see #75 (comment).

@RubenVerborgh
Copy link
Owner

RubenVerborgh commented Jul 4, 2022

I usually call this._source.read() outside of if statements as this reduces the amount of branching

Yeah; and I tend to optimize for fewer return points. The performance should not be impacted (and thanks for confirming that).

Ooh interesting - I thought those were failing before without.
If tests are working then I think it is safe to get rid of it provided we have good enough coverage with asynchronous sources for the iterator.

Yes—and that's the benefit of not messing with the base readable mechanism. It contains several guards and is ridiculously well tested for all kinds of orders. So we use the same logic now and inherit the same protections.

"Approved" @RubenVerborgh with one suggested change (I don't have permissions to click the actual review button :))

And fully agreed—the code is even leaner now. I really like what we landed on:

  read(): D | null {
    if (!this.done) {
      // Try to read an item that maps to a non-null value
      if (this._source.readable) {
        let item: S | null, mapped: D | null;
        while ((item = this._source.read()) !== null) {
          if ((mapped = this._map(item)) !== null)
            return mapped;
        }
      }
      this.readable = false;

      // Close this iterator if the source is empty
      if (this._source.done)
        this.close();
    }
    return null;
  }

This has been a common occurrence for me with AsyncIterator: lots of trying and fiddling, and eventually, you end up with really simple code. However, all of the fiddling resulted in lots of tests for edge cases that you wouldn't have found even if you would have thought of the simple code at the start (and you'd probably have gotten 1 or 2 lines wrong, but wouldn't know).

@RubenVerborgh RubenVerborgh assigned RubenVerborgh and unassigned jeswr Jul 4, 2022
@RubenVerborgh RubenVerborgh force-pushed the feat/mappingIterator branch from 43c502f to 42cae48 Compare July 4, 2022 11:01
@RubenVerborgh RubenVerborgh force-pushed the feat/mappingIterator branch from 42cae48 to def9c09 Compare July 4, 2022 11:49
return this.transform({ filter: self ? filter.bind(self) : filter });
return this.map(function (this: any, item: T) {
return filter.call(self || this, item) ? item : null;
});
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeswr Found this nasty bug still when restoring all of the main tests. (The this pointer filter is called with would be wrong because of the nested function.)

@RubenVerborgh RubenVerborgh merged commit def9c09 into main Jul 4, 2022
@RubenVerborgh RubenVerborgh deleted the feat/mappingIterator branch July 4, 2022 11:55
@RubenVerborgh
Copy link
Owner

Thanks a lot, @jeswr, excellent work. Thanks @jacoscaz as well.

@jacoscaz
Copy link
Collaborator

jacoscaz commented Jul 4, 2022

I had very little to do with this! Great job. I'll rebase #63 ASAP.

@RubenVerborgh
Copy link
Owner

@jacoscaz Reviews and chats are important! See Gitter 😉

@rubensworks
Copy link
Collaborator

Nice work!

@jeswr Do you have any insights into the performance impact on Comunica?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants