-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diesel Performance Fixes, Batching Improvements, New Allocator #262
Conversation
hashset about 0.3% CPU time improvement
fxhasmap -> 0.25% perf
tokio::task::spawn_blocking(move || { | ||
DbHandler::batch_upload(owned, pool)}); | ||
let msg_len = msgs.len(); | ||
let chunk_size = msg_len / ((msg_len / 8190) + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you now dividing by 8190 instead of 16380, something special about dividing max params by 8 instead of 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ya forgot to investigate that. The switch to diesel async doubled the number of instructions per insert. I'll do some investigating there. It's a very annoying limit.
@@ -81,17 +110,16 @@ impl DbHandler { | |||
// libpq has max 65535 params, therefore batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually never understood this fully. why is the batching logic needed for diesel... is it because it let's you try and insert more data then libpq can handle. Does prisma just manage this itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess prisma splits the queries as needed. I doubt they work around libpq as it's kinda the best way to communicate with postgres. It's a very annoying limit but kinda inherent to postgres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integration tests fail sometimes when you either have the database disabled in docker or haven't set it up yet. I think just adding a sleep like your github tests. For now we can probably just restart or remove the database if it is already a docker container but disabled.... I can push changes for this if you want.
everything compiles tests and runs fine on mac m1.
Ya can unadd the fix to integration tests? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changes
Rationale: Performance of blocking threads decreases with more threads in use, affecting large chunk batching and the [Scylla] - Investigate a new CSV handler #243 code
Rationale: The system allocator was not freeing the memory used by the batching tasks effectively, resulting in excessive memory usage. It takes now approximately 1-2gb of RAM to upload an hour of data, and that RAM is released back when the upload is complete. Previously it was 5-10gb, and the memory was not freed.
Rationale (found in the docs for DataInsert): This aligns closer to the actual data we recieve, which is not null 4 byte floating point integers. This allows less re-allocation of data when converting our recieved protobuf to the data the database is looking for, improving performance.
Rationale: Before, chunks of data to upload were split into a few even chunks and one chunk of only a few points. The new algorithm better evens out the chunks. It still could use improvement, and more investigation into the libpq instruction limit could be warrented.
Notes
Jemalloc may fail to build on aarch64, hence the CI running, so we should pay attention to any unsoundness or stability issues.
Test Cases
To Do
Any remaining things that need to get done
Checklist
It can be helpful to check the
Checks
andFiles changed
tabs.Please review the contributor guide and reach out to your Tech Lead if anything is unclear.
Please request reviewers and ping on slack only after you've gone through this whole checklist.
package-lock.json
changes (unless dependencies have changed)Closes #244
Closes #184