-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FixedQueue #1
Comments
I took FixedQueue directly from NodeJs source code. I used it because I intend to open a PR to them and this queue implementation showed a very good performance. It's not actually my code. |
I know, I was reading the thread. I could be open to giving a place in the iter-tools organization though. |
@conartist6 I took a look on the @iter-tools/queue and it actually followed a logic quite similar to a minimalist implementation I did first (just on my machine): function simpleQueue() {
let first;
let last;
return {
isEmpty() {
return !first;
},
push(value) {
const node = { value };
if (last) {
last.next = last = node;
} else {
last = first = node;
}
},
shift() {
if (first) {
const { value } = first;
first = first.next;
if (!first) {
last = first;
}
return value;
}
}
};
} With this implementation, the performance seemed a bit worse in the benchmark than with the FixedQueue, or it was at least the same. As internally node already had that implementation in hand and the intention of this package was to make an easy copy and paste code, I decided to use it. I trusted that the NodeJs team came up with this solution after some studies involving the V8 nature and memory management, so, this is a question I really want to get deep into, later. About using your package, I don't have anything against it. I can do that but I really want to take some time to understand better the results I got on my preliminary tests. |
Your code does its best to keep no more than a single item in the buffer. Anything should perform well I'd think, even an array. |
Actually, my code tries to pause the readable interface when the queue reaches 2048 items, so, it looks like this will be the biggest size of the queue and I'll have only one array in memory, but it's not quite like that. I notice that, even after the call for the pause method, I can still receive line events, so It's not guaranteed that I'll stay with 2048 items, probably due to the size of the chunks readline is reading. In my tests I actually got almost 5000, so with this FixedList implementation, I got 3 arrays of 2048 in memory in that situation, and probably it performed better because the allocation of an array of that size is optimized in the V8, compared to creating 5000 linked objects. But I'm really curious about it and I'll try to understand it better when I got some time. |
The readable interface says that the |
Also to your point of the |
In the FixedQueue implementation, instead of creating one object for each node, it starts creating an array of 2048 items that will be used as a circular queue. If you need more than 2048 items in the queue, FixedQueue creates another array, managing them as a "queue of queues". I pointed that out because you said my code keeps no more than a single item in the buffer, but it's not actually true, because I got almost 5000 items in one of my tests. Or maybe I understood something wrong from what you said |
You're missing the point. A chunk is by default 64K. If you fill up even one circular buffer (2048 items) that would be 131M of data sitting in buffers. |
What I mean by chunk is each message that is read from the original stream. I'll not have chunks in the queue, I'll have lines, which are small smaller strings. Maybe we're having some language difficult here, as English is not my first language |
Your english is great! I think my difference stems from the fact that I would prefer to see splitting into lines happen as an operation on an iterable, not on a stream. But the code you've written, and particularly if it were adopted into node, has little control over what size chunks the stream is split into, meaning it must work equally well for big or small chunks. |
What I really want to be writing is |
But it's the only way you can write any kind of code without creating new edge cases when things you need to understand cross chunk boundaries. |
@conartist6 oh yeah, I got your point now, you're right! I was also worried about that but then I checked out the later implementation using Readable and this also happens there, so it's actually not an disadvantage of the new implementation, it just hasn't improved this point But thanks to you I just got an idea of how to deal with that. As soon as possible I'll try it out. Thank you |
Haha glad I could help. |
@conartist6 I created a version that uses a custom splitLine, and also avoid completely the use of the readline package. I will not have problems with the edges cases where each message from the stream have too many lines, but for some reason the performance is a bit worse than the current one. |
I just did a benchmark of this new split and it greatly loses to the vanilla one:
I uploaded the benchmark code. |
You have function* splitLines(string: string) {
return {
next() {
/* ... */
},
[Symbol.iterator]() { return this; },
};
} You need to either use a generator function or return an iterator object, but not both. I can't tell what the "vanilla" code really does, but it's not what you think. |
Also when the chunk boundary falls between |
Actually I guess both/all the implementations you benchmark are subject to that bug. In iter-tools Since that doesn't generalize nicely to more arbitrary kinds of searches (how big does the buffer need to be to ensure correctness?) I also wrote @iter-tools/regex. |
Oh another fun thing to think about here is how the result of a split is created and used, particularly in light of the way browsers make substrings, which is to say by creating a pointer to a certain range of indexes in the original string. Thus after you split a string into parts this way, the complete string must remain in memory even if only one part (line, in your case) is still referenced. Building strings up from character iterables ensures that that sort of memory leak cannot happen. |
About the generator, it was an error of the benchmark, in the real code on test.js it's not like that, so thank you for pointing it out. I'll fix it out to see the result |
Your insight about string construction was quite nice, thank you, but this new version is experimental and actually, I only wanted to test a concept that doesn't use the string.split method, so no array of strings is created consuming more memory.
So, unless some idea that is faster comes out I would stick with the original implementation that uses the FixedQueue + the split method, that is currently used internally in the readline module. |
Yeah that looks more like I thought it would. You're going to be hard pressed to beat a native method for pure speed. Using memory is fast. The advantage of iterables is not raw speed, it's that memory usage is very predictable. Imagine someone sends you a stream that's 2GB and contains no newlines. Now you need 2GB of memory to do anything at all. If you had done that same split implemented as pure iterables (as iter-tools does) you could process the whole thing without ever using more than a few megabytes of memory, allowing the code to run on even the most resource-constrained devices. Of course the iterable solution isn't really using any less memory as it has to allocate all those |
Actually, the space complexity of an iterable solution does really use less memory because each IteratorResult is created one by one, giving virtually an O(1) complexity, but the trade-off needs to be very well measured in this case. |
I did this change on a separate branch of my node fork and ran a benchmark of the readline iterable. Here are the results I got:
So, it seems like we have a loss of performance with the change to the generator instead of native string.split, but that loss is often in the error margin, so I'd rather say it's a good enough change if we benchmark the memory cost as well and the benefits are really good, and maybe the code I did to generate the iterable can be bettered to improve its performance too. |
I can't tell what you were going for with the
FixedQueue
class. It has some beautiful ASCII art in it, but it seems to be part of a design that was not the final one. In particular it seems to assume that it will be fed data character by character, and thus that a pure linked list would have too much memory overhead. In fact though only chunks are ever stored in it so it could be trivially replaced with something like@iter-tools/queue
, which would also allow you to remove thequeueSize
variable.I also think the dynamic requires on the queue are premature optimization.
The text was updated successfully, but these errors were encountered: