-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage of the resource destructor #178
Comments
@varsill thanks for a detailed issue report and analysis. Deallocation taking lot of CPU is certainly odd. I'll check this as high priority as soon as I get sometime.
As of now no. Is it possible to share a sample code so I can reproduce this easily? I also like to see the context in which these operations are performed (process involvement etc) Also, a blind suggestion if you want an immediate workaround, try wrapping all the vips operations for each frame in separate process and see if it helps? Something like |
@akash-akya Thank you very much for a quick response!
Great, thanks in advance!
Indeed, this one seems to solve an issue with uneven work distribution among the normal schedulers. When I run
Sure, I will try to prepare some sample script that reproduces that behaviour |
Hi @varsill, any luck with sample script? :) |
Hello @akash-akya! To run it, you first need to:
However, when I run this script I am unable to observe the behaviour I encountered in my system. I kept on debugging my system and I was able to find out that:
|
Hi @akash-akya ! |
Hi @varsill, I tried the script you shared, but I couldn't reproduce the behavior, it might be tough to track the exact issue. I am trying to understand the issue/concern better so please help me out if these assumptions are correct: CPU UtilizationThe main issue here is we are not able to utilize all of available CPU in the host system? because when I tested locally it always uses all of the available CPU. High scheduler usage on its own does not imply there is an issue because image processing operation itself might be expensive (maybe we don't want to libvips to use everything but that is different question). You might know this already but all libvips operations happen in separate thread, so you won't see anything in thread trace, only NIF code in Normal SchedulerFrom the first screenshot you shared, it seems you occasionally see Thanks for starting the thread! but at this point I not clear about the root cause to attempt a solution. If you are interested, I can attempt to optimize the script you shared. Usually libvips works better when operations can be merged. If we do lot of intermediate memory writes then it won't be able to do it (similar to |
Hello @akash-akya , thanks for the answer and explanation!
Oh, I am not sure it works like that - as stated here: https://erlangforums.com/t/how-to-deal-with-destructors-that-can-take-a-while-to-run-and-possibly-block-the-scheduler/4290/3: "if the resource destructor is triggered on a normal scheduler, it will run on that same scheduler. Otherwise, if it is triggered on a dirty scheduler, it will be rescheduled to run on the first scheduler."
The lowest number of concurrent libvips operations for which I was able to spot the excessive CPU usage was 5.
Well, the CPU usage was really high, it has eaten up 1000% of CPUs. When I reduced the number of libvips worker threads to 8, the CPU usage was reduced to 300% and what is more the "throughput" (number of images I was able to compose in a fixed period of time) increased.
I used the configuration with 12 online schedulers (and the same number of dirty schedulers).
Oh, that would be great if you could provide me with some hints on how to merge those operations! |
@varsill thanks for sharing the details
Understood. This is definitely odd and looks like an issue. And probably a coordination issue like you mentioned. I'll investigate further to reproduce and fix. Any pointers on reproducing this would be helpful :) I'll try to optimize the operations and see how far I can go |
@akash-akya thank you very much!
I believe the crucial thing would be to ensure that the resource destructor is triggered from the dirty NIF scheduler. When I run the script I provided a couple of answers above, I see that the destructors are run on normal schedulers (which implies, that they were triggered from these normal schedulers) and I think this is the reason why the problem is not reproduced. |
Hello!
Recently I've come across a problem with high CPU usage when processing images with Vix v0.31.1 in Elixir 1.16.3 with OTP 26.
My scenario looks as follows - I am generating a bunch of videos out of sequences of JPEG images in realtime. That is why I need to load images from files and convert them to YUV color space (so that they can later be encoded).
To be more precise, for each video I run the following sequence of operations multiple times per second (corresponding to the video's framerate):
Though this task indeed is quite computationally heavy, it's looks to me as if it was using more resources than I expect.
Apart from that I am unable to scale the number of videos beyond 5/6, despite there are still some CPU resources left on my computer.
I did some profiling and noticed that normal scheduler 1 is pushed to its boundaries, with its utilization reaching 100%, in contrast to other normal schedulers, which are almost unused. Below there is a screenshot from the Erlang's observer:
I thought it might have something to do with the fact the the dirty NIFs resources destructor is always run on the normal scheduler 1 (https://github.com/erlang/otp/blob/51548614b588c667a4230c17e2b187474dd70d63/erts/emulator/beam/erl_nif.c#L2841)
so I decided to see what operations are executed on that scheduler. I used
sample
to do thebeam.smp
process profiling and it turns out, that the majority of time spent by the normal scheduler 1 thread is indeed spent onrun_resource_dtor
.Below there is part of the output of my profiling results:
What is interesting, I see that the part starting with
g_signal_emit
is repeating itself several thousands of times, apparently decreasing some reference counter.I've tried reducing the Vips concurrency and after setting it to
1
, the amount of CPU used byrun_resource_dtor
is smaller, but still significant (around 50% of time obtained with Vips concurrency set to 8).I've several questions concerning my issue:
g_signal_emit
cycle" is run that many times?The text was updated successfully, but these errors were encountered: