feature/mulithreading-changes #168

ndortega · 2024-02-29T03:57:55Z

found a 10 - 15% performance gain by removing the current queue + multithreading approach and leaning harder on the scheduler built into Julia using a cooperative multitasking strategy.

Additionally, the amount of code to get this working allowed me to completely remove the StreamUtil module, some state variables, and type definitions in favor of a single function - which makes it much easier to maintain.

…ning on Julia's scheduler

codecov · 2024-02-29T04:01:03Z

Codecov Report

Attention: Patch coverage is 98.11321% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 99.20%. Comparing base (73a8f0a) to head (40700e8).
Report is 1 commits behind head on master.

Files	Patch %	Lines
src/core.jl	97.43%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #168      +/-   ##
==========================================
+ Coverage   97.49%   99.20%   +1.71%     
==========================================
  Files          20       19       -1     
  Lines        1157     1138      -19     
==========================================
+ Hits         1128     1129       +1     
+ Misses         29        9      -20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ndortega · 2024-02-29T04:29:15Z

Hi @JanisErdmanis,

I have tested both approaches locally on my laptop with Julia 1.6 and 1.10 and found either the same or better performance with these changes. The results of this test almost feel too good to be true which makes me a bit hesitant.

What do you think?

…ed to serveprallel and updated docs

JanisErdmanis · 2024-03-02T17:43:29Z

Hi @ndortega

Sorry, I did not see the message. I am unfamiliar with the parallelism internals and don't yet use them; thus, I can't comment on the performance. But I am happy that there is such a feature and may find it helpful in the future. For now, I can only offer some ideas to consider:

serve and serveparallel have very similar functional structures and almost the same kwargs. Perhaps it would be wise to combine them into a single serve where parallelism can be enabled with a keyword argument.
It looks very strange and cumbersome to have so many kwarg arguments to startserver, for which closure is also passed. I would suggest to put startserver function body into serve.

I am rather busy and can't help with coding.

EDIT: I think I am able to grasp how parallelism is implemented now. I find it concerning that there is a shared mutable state between threads, which is introduced by the metrics history object. A simple fix would be to create a history object for each thread and then aggregate that in a single history object, which can be shown on the metrics page. But perhaps that is unnecessary for CircularDeque.

…threadsafe

ndortega · 2024-03-03T03:16:58Z

Hi @JanisErdmanis,

I'm glad you brought up that point about the shared History object. I've used a ReentrantLock to make writing and reading from the history object completely threadsafe. That being said, I'll need to look into memorizing the metrics outputs since these calculations can take a while when there's hundreds of thousands of requests and we call the metrics endpoint on an interval

And I agree with your second point, with these latest changes the functions are basically identical to one another and could be merged into a single one

JanisErdmanis · 2024-03-03T08:25:58Z

That being said, I'll need to look into memorizing the metrics outputs since these calculations can take a while when there's hundreds of thousands of requests and we call the metrics endpoint on an interval

You could look into implementing a function merge!(history, thread_1_history) which perhaps can be made fast. I don't think that there should be concern about making the metrics dashboard fast yet as this could complicate matters.

An alternative to using ReentrantLock which may imply switching between threads is to run the metrics on a single thread only and do the paralelization after that. Now your handler schematically is paralelizer(metrics(router)) but it could be beneficial to rewrite it as metrics(paralelizer(router)) as then you would no longer need to use a ReentrantLock which should be faster as it does not need synchroniation between tasks.

ndortega · 2024-03-03T22:24:48Z

@JanisErdmanis

I liked the idea of dropping the ReentrantLock and writing it like metrics(paralelizer(router)). I reafactored the code to not use any locks and only exclude docs and metrics middleware from getting wrapped with the parallel handler.

Now that the main thread was used for docs & metrics middleware exclusively - I had to make the metrics endpoint was asynchronous in order to prevent the app from halting when viewing the metrics under heavy load. Even after these changes I was surprised to see a 2x slowdown in RPS on my laptop between this approach and the ReentrantLock approach.

If you're curious and want to view this code let me know and I'll push that branch

JanisErdmanis · 2024-03-04T00:45:17Z

I was surprised to see a 2x slowdown

This is unexpected. I wonder if it could be due to tasks also being scheduled on the main thread which could slow it down. Push the code on the branch, I may have a look.

ndortega · 2024-03-04T01:21:06Z

Sure thing, here's the branch I was working out of: https://github.com/OxygenFramework/Oxygen.jl/tree/feature/multi-threading-experimental-changes

JanisErdmanis · 2024-03-04T10:37:44Z

Yeah. ~~I suspect that the performance drop is due to tasks being spawned on the primary thread.~~ I the additional computation with metrics may have affected some fairness policy in the scheduler so the main thread gets to host other threads tasks.

You may find interesting to explore ThreadPools.jl.

EDIT: The issue is that the metrics middleware waits for the response response = handler(req) and thus is blocked to accept new requests.

ndortega · 2024-03-08T16:15:48Z

Exactly, it won't be worth making this refactor until I speed up the metrics calculations. This shouldn't be too difficult since the helper utils are all very similar.

Out of the 7 metrics returned:

3 of these metrics iterate over the entire history
4 of these metrics iterate over the same filtered part of the history

Possible Next Steps:

Break these metrics into two groups with a loop each to
Combine these two sets of metrics so it only uses a single loop to calculate all these metrics
Implement a memo table to store previous calculations, if done well I may not even need to do either of the points above

JanisErdmanis · 2024-03-09T00:00:42Z

Perhaps you can also consider of using a channel where requests are being but by the main thread and then processed by each individual thread returning HTTPResponse object in another channel which is the routed back to users. Though I don't know how to do the latter. The benefit of this approach is that it would delegate Julia to schedule tasks. But then again the throughput depends on a single history recording objects and acquiring a lock for the channel...

I think the design decission here needs to be supported by benchmarks. One of the ways is to check how many requests per second it takes to satturate HTTP without metrics and with metrics. If the saturation with metrics is sufficient then it can be shared by every thread and used together with a lock as it is now. Else that needs to be put within each system thread event loop to improve throughput.

ndortega added 3 commits February 28, 2024 17:09

simplified how the multithreading works by removing the queue and lea…

070c2bb

…ning on Julia's scheduler

removed test that's no longer needed

f677370

removed unused travis config

049bb44

ndortega added 3 commits February 28, 2024 23:03

added missing terminate to parallel tests

7eb0773

updated paralleltests

00d54ce

fixed docstring

911dff1

ndortega added 3 commits February 28, 2024 23:52

added another test set for colletcing and calculating metrics

0b24e50

turn off logging for this metrics test

95850b6

added a deprecation warning when a queuesize keyword argument is pass…

d076fd6

…ed to serveprallel and updated docs

ndortega added 2 commits March 2, 2024 13:36

added deprecated queuesize param to serveparallel

7e44b8c

used a ReentrantLock to make reading & writing to the History object …

40700e8

…threadsafe

ndortega merged commit db0852d into master Mar 3, 2024
8 checks passed

ndortega deleted the feature/mulithreading-changes branch March 8, 2024 02:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature/mulithreading-changes #168

feature/mulithreading-changes #168

ndortega commented Feb 29, 2024

codecov bot commented Feb 29, 2024 •

edited

Loading

ndortega commented Feb 29, 2024

JanisErdmanis commented Mar 2, 2024 •

edited

Loading

ndortega commented Mar 3, 2024 •

edited

Loading

JanisErdmanis commented Mar 3, 2024

ndortega commented Mar 3, 2024

JanisErdmanis commented Mar 4, 2024

ndortega commented Mar 4, 2024

JanisErdmanis commented Mar 4, 2024 •

edited

Loading

ndortega commented Mar 8, 2024

JanisErdmanis commented Mar 9, 2024

feature/mulithreading-changes #168

feature/mulithreading-changes #168

Conversation

ndortega commented Feb 29, 2024

codecov bot commented Feb 29, 2024 • edited Loading

Codecov Report

ndortega commented Feb 29, 2024

JanisErdmanis commented Mar 2, 2024 • edited Loading

ndortega commented Mar 3, 2024 • edited Loading

JanisErdmanis commented Mar 3, 2024

ndortega commented Mar 3, 2024

JanisErdmanis commented Mar 4, 2024

ndortega commented Mar 4, 2024

JanisErdmanis commented Mar 4, 2024 • edited Loading

ndortega commented Mar 8, 2024

JanisErdmanis commented Mar 9, 2024

codecov bot commented Feb 29, 2024 •

edited

Loading

JanisErdmanis commented Mar 2, 2024 •

edited

Loading

ndortega commented Mar 3, 2024 •

edited

Loading

JanisErdmanis commented Mar 4, 2024 •

edited

Loading