Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FixedQueue #1

Open
conartist6 opened this issue Dec 21, 2021 · 26 comments
Open

FixedQueue #1

conartist6 opened this issue Dec 21, 2021 · 26 comments

Comments

@conartist6
Copy link

I can't tell what you were going for with the FixedQueue class. It has some beautiful ASCII art in it, but it seems to be part of a design that was not the final one. In particular it seems to assume that it will be fed data character by character, and thus that a pure linked list would have too much memory overhead. In fact though only chunks are ever stored in it so it could be trivially replaced with something like @iter-tools/queue, which would also allow you to remove the queueSize variable.

I also think the dynamic requires on the queue are premature optimization.

@Farenheith
Copy link
Owner

I took FixedQueue directly from NodeJs source code. I used it because I intend to open a PR to them and this queue implementation showed a very good performance. It's not actually my code.
Unfortunately the implementation on this package doesn't completely cover the test cases from the NodeJs project so I declined the idea of the PR

@conartist6
Copy link
Author

I know, I was reading the thread.

I could be open to giving a place in the iter-tools organization though.

@Farenheith
Copy link
Owner

@conartist6 I took a look on the @iter-tools/queue and it actually followed a logic quite similar to a minimalist implementation I did first (just on my machine):

function simpleQueue() {
  let first;
  let last;
  return {
    isEmpty() {
      return !first;
    },
    push(value) {
      const node = { value };
      if (last) {
        last.next = last = node;
      } else {
        last = first = node;
      }
    },
    shift() {
      if (first) {
        const { value } = first;
        first = first.next;
        if (!first) {
          last = first;
        }
        return value;
      }
    }
  };
}

With this implementation, the performance seemed a bit worse in the benchmark than with the FixedQueue, or it was at least the same. As internally node already had that implementation in hand and the intention of this package was to make an easy copy and paste code, I decided to use it. I trusted that the NodeJs team came up with this solution after some studies involving the V8 nature and memory management, so, this is a question I really want to get deep into, later.

About using your package, I don't have anything against it. I can do that but I really want to take some time to understand better the results I got on my preliminary tests.

@conartist6
Copy link
Author

Your code does its best to keep no more than a single item in the buffer. Anything should perform well I'd think, even an array.

@Farenheith
Copy link
Owner

Actually, my code tries to pause the readable interface when the queue reaches 2048 items, so, it looks like this will be the biggest size of the queue and I'll have only one array in memory, but it's not quite like that. I notice that, even after the call for the pause method, I can still receive line events, so It's not guaranteed that I'll stay with 2048 items, probably due to the size of the chunks readline is reading. In my tests I actually got almost 5000, so with this FixedList implementation, I got 3 arrays of 2048 in memory in that situation, and probably it performed better because the allocation of an array of that size is optimized in the V8, compared to creating 5000 linked objects.

But I'm really curious about it and I'll try to understand it better when I got some time.

@Farenheith Farenheith assigned Farenheith and unassigned Farenheith Dec 21, 2021
@conartist6
Copy link
Author

The readable interface says that the data event is called with a chunk. You're saying the queue contains 2048 chunks? Or have I misunderstood something else?

@conartist6
Copy link
Author

Also to your point of the @iter-tools/queue code being like your minimalist implementation, that is very much on purpose. My goal with many of these projects is to ensure that they are not significantly more complicated to understand or use than just writing your own copy, because often it will be easy to build a simple version of the functionality I offer.

@Farenheith
Copy link
Owner

In the FixedQueue implementation, instead of creating one object for each node, it starts creating an array of 2048 items that will be used as a circular queue. If you need more than 2048 items in the queue, FixedQueue creates another array, managing them as a "queue of queues".
About the chunks, the readable interface implementation receives chunk by chunk from the original stream and, for each chunk, he splits it into lines. What I meant is that, probably, the test I mentioned got big chunks that came out to have thousands of lines each one. For example, he could get the first chunk with 2000 lines, split the lines, and a second one with 3000 lines, split the lines, and then my package called pause. The pause here will prevent it from getting further chunks, but will not prevent it from processing the current split lines, so, even though I called pause when I got 2048 lines, I'll still receive 2952 additional lines from the readline interface.
The effect I got in the FixedQueue implementation is that it will instantiate three circular queues of 2048 items so it can deal with all the 5000 chunks. I don't know how much memory it uses but, let's say is 2048 * 16 bytes for each array, so, 32Kb. The decision for this strategy is from the NodeJs team, but I believe that it's pretty optimized to run in the V8 engine.

I pointed that out because you said my code keeps no more than a single item in the buffer, but it's not actually true, because I got almost 5000 items in one of my tests. Or maybe I understood something wrong from what you said

@conartist6
Copy link
Author

conartist6 commented Dec 21, 2021

You're missing the point. A chunk is by default 64K. If you fill up even one circular buffer (2048 items) that would be 131M of data sitting in buffers.

@Farenheith
Copy link
Owner

Farenheith commented Dec 21, 2021

What I mean by chunk is each message that is read from the original stream. I'll not have chunks in the queue, I'll have lines, which are small smaller strings. Maybe we're having some language difficult here, as English is not my first language

@conartist6
Copy link
Author

Your english is great!

I think my difference stems from the fact that I would prefer to see splitting into lines happen as an operation on an iterable, not on a stream. But the code you've written, and particularly if it were adopted into node, has little control over what size chunks the stream is split into, meaning it must work equally well for big or small chunks.

@conartist6
Copy link
Author

What I really want to be writing is openFile('foo.js').splitOnAny(['\r\n', '\n']). But I'm held up because it implies tiny chunks (individual characters) in the output iterable, something which cannot possibly be sufficiently performant without optimizations to avoid promises and intermediate objects.

@conartist6
Copy link
Author

But it's the only way you can write any kind of code without creating new edge cases when things you need to understand cross chunk boundaries.

@Farenheith
Copy link
Owner

Farenheith commented Dec 21, 2021

@conartist6 oh yeah, I got your point now, you're right! I was also worried about that but then I checked out the later implementation using Readable and this also happens there, so it's actually not an disadvantage of the new implementation, it just hasn't improved this point

But thanks to you I just got an idea of how to deal with that. As soon as possible I'll try it out. Thank you

@conartist6
Copy link
Author

Haha glad I could help.

@Farenheith
Copy link
Owner

@conartist6 I created a version that uses a custom splitLine, and also avoid completely the use of the readline package. I will not have problems with the edges cases where each message from the stream have too many lines, but for some reason the performance is a bit worse than the current one.
While the current version is 44% faster, the new one is just 34%. I didn't go deep into it yet, but maybe there's something that could be done better in this new version.

@Farenheith
Copy link
Owner

I just did a benchmark of this new split and it greatly loses to the vanilla one:

    vanilla split x 239 ops/sec ±8.54% (73 runs sampled)
    new split x 45.29 ops/sec ±10.60% (48 runs sampled)

I uploaded the benchmark code.
Regardless, I tried to do a version of the new method using vanilla split but it still improved just 34%, mot 44% as the current one, so still not quite clear why.

@conartist6
Copy link
Author

You have

function* splitLines(string: string) {
  return {
    next() {
      /* ... */
    },
    [Symbol.iterator]() { return this; },
  };
}

You need to either use a generator function or return an iterator object, but not both. I can't tell what the "vanilla" code really does, but it's not what you think.

@conartist6
Copy link
Author

Also when the chunk boundary falls between '\r' and '\n' your vanilla splitLines would fail. I'm not sure it matters for a benchmark, but that's why it's so nice to just reason about character iterables (where you may have to wait for any character).

@conartist6
Copy link
Author

conartist6 commented Dec 27, 2021

Actually I guess both/all the implementations you benchmark are subject to that bug. In iter-tools asyncSplitOnAnySeq(['\r\n', '\n']) would first do a window transform then do its comparisons against a circular buffer of two values.

Since that doesn't generalize nicely to more arbitrary kinds of searches (how big does the buffer need to be to ensure correctness?) I also wrote @iter-tools/regex.

@conartist6
Copy link
Author

conartist6 commented Dec 27, 2021

Oh another fun thing to think about here is how the result of a split is created and used, particularly in light of the way browsers make substrings, which is to say by creating a pointer to a certain range of indexes in the original string. Thus after you split a string into parts this way, the complete string must remain in memory even if only one part (line, in your case) is still referenced. Building strings up from character iterables ensures that that sort of memory leak cannot happen.

@Farenheith
Copy link
Owner

About the generator, it was an error of the benchmark, in the real code on test.js it's not like that, so thank you for pointing it out. I'll fix it out to see the result

@Farenheith
Copy link
Owner

Farenheith commented Dec 27, 2021

Your insight about string construction was quite nice, thank you, but this new version is experimental and actually, I only wanted to test a concept that doesn't use the string.split method, so no array of strings is created consuming more memory.
Unfortunately, this version is not faster than just using the split method, even after fixing the benchmark bug:

vanilla split x 57.84 ops/sec ±3.79% (61 runs sampled)
new split x 69.24 ops/sec ±2.73% (71 runs sampled)

So, unless some idea that is faster comes out I would stick with the original implementation that uses the FixedQueue + the split method, that is currently used internally in the readline module.

@conartist6
Copy link
Author

conartist6 commented Dec 27, 2021

Yeah that looks more like I thought it would. You're going to be hard pressed to beat a native method for pure speed. Using memory is fast.

The advantage of iterables is not raw speed, it's that memory usage is very predictable. Imagine someone sends you a stream that's 2GB and contains no newlines. Now you need 2GB of memory to do anything at all. If you had done that same split implemented as pure iterables (as iter-tools does) you could process the whole thing without ever using more than a few megabytes of memory, allowing the code to run on even the most resource-constrained devices.

Of course the iterable solution isn't really using any less memory as it has to allocate all those {done, value} objects, but those have a very short lifespan and the garbage collector is optimized for collecting them. Still though repeated garbage collection is a costly part of running the algorithm, which is why I'm so interested in avoiding the need to allocate those objects.

@Farenheith
Copy link
Owner

Farenheith commented Dec 27, 2021

Actually, the space complexity of an iterable solution does really use less memory because each IteratorResult is created one by one, giving virtually an O(1) complexity, but the trade-off needs to be very well measured in this case.
Given I'm using this library to test a concept to make a PR to NodeJs and the first version has no trade-off, only performance optimization, I can come back to this and make some benchmarks maybe to show how much is worth it to change the strategy to use less memory. Although it would be best to have some native function doing that.

@Farenheith
Copy link
Owner

Farenheith commented Dec 29, 2021

I did this change on a separate branch of my node fork and ran a benchmark of the readline iterable. Here are the results I got:

                                        confidence improvement accuracy (*)    (**)   (***)
readline/readline-iterable.js n=10                     -5.98 %       ±8.16% ±10.86% ±14.13%
readline/readline-iterable.js n=100              *     -8.82 %       ±8.04% ±10.70% ±13.92%
readline/readline-iterable.js n=1000           ***    -13.69 %       ±4.31%  ±5.74%  ±7.48%
readline/readline-iterable.js n=10000           **     -8.04 %       ±4.76%  ±6.34%  ±8.26%
readline/readline-iterable.js n=100000                 -1.45 %       ±3.60%  ±4.79%  ±6.24%
readline/readline-iterable.js n=1000000                -2.60 %       ±3.20%  ±4.26%  ±5.55%

Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 6 comparisons, you can thus expect the following amount of false-positive results:
  0.30 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.06 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

So, it seems like we have a loss of performance with the change to the generator instead of native string.split, but that loss is often in the error margin, so I'd rather say it's a good enough change if we benchmark the memory cost as well and the benefits are really good, and maybe the code I did to generate the iterable can be bettered to improve its performance too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants