Performance degenerates a lot when reading from multiple threads compared with a single thread (when running Clickhouse) #1985
Replies: 15 comments
-
Could you run
Moreover, increasing metadata cache for read-only workload might help, see: https://juicefs.com/docs/cloud/cache#metadata-cache. |
Beta Was this translation helpful? Give feedback.
-
@sighingnow You may also want to increase the memory buffer used ( |
Beta Was this translation helpful? Give feedback.
-
Thanks for the guidance @davies @SandyXSD. TL,DR: the root cause in our cases is the high CPU usage of juicefs when workload (ClickHouse) reading from 4 threads. Juicefs consumes about The
Our machine has 4 physical cores and 2 threads per core
When running the workload using 4 threads, juicefs and the workload itself (ClickHouse) will contend for CPU resources and yields bad latency numbers, thus poor performances. It can be verified that running ClickHouse with
It doesn't work. |
Beta Was this translation helpful? Give feedback.
-
It looks like the bandwidth of block cache read is much greater than that of fuse read when running with 4 threads, which might be a reason causing higher CPU usage. We'll look into that to see if this behavior could be improved. |
Beta Was this translation helpful? Give feedback.
-
Thank you! I have also noticed similar high CPU usage occurs in the benchmarking documentation page under a Looking forward to your insights! |
Beta Was this translation helpful? Give feedback.
-
@sighingnow Can you tell us how to reproduce this issue? |
Beta Was this translation helpful? Give feedback.
-
(we are using oss as the underlying storage, but I think it doesn't matter as we ensure all data are cached in local disk)
(the cache directory
prepare the code,
prepare the benchmark suite (to save your time, you could first mofidying the SQL file
The first run of |
Beta Was this translation helpful? Give feedback.
-
We also noticed similar performance degeneration other queries (e.g., query 21, 23, 28, 29) as well, but just research on query 29 should be enough to notice the high CPU usage of juicefs. |
Beta Was this translation helpful? Give feedback.
-
Also, I think some micro benchmark could reproduce the high CPU usage issue as well :) |
Beta Was this translation helpful? Give feedback.
-
I will try above content. |
Beta Was this translation helpful? Give feedback.
-
The reason is that JuiceFS limits the number of random read threads for each opened file descriptor(to save memory). It looks like that current limit(2) is not enough for scenarios like this Clickhouse query, which concurrently reads a huge file at different offsets in the same process. Btw, the high CPU usage in JuiceFS benchmark doc is caused by getting objects from storage, not related to this issue. |
Beta Was this translation helpful? Give feedback.
-
@SandyXSD Thanks for the quick investigation! It does work, by improving the 4 thread performance on juicefs from 56 seconds to 35 seconds, but there's still a huge gap between SSD (around 24 seconds), and I still noticed about
Do you folks have any further insights about the problem? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Thanks for clarification. |
Beta Was this translation helpful? Give feedback.
-
Yes, increasing The current result is expected. Because JuiceFS is a network file system built based on FUSE, it consumes more CPU (for splitting buffer, copy data between kernel & userspace, etc.) than local kernel file systems, and usually brings a bit higher latency. We have the plan to improve performance after v1.0-GA is released, but for now, it's not the main focus. |
Beta Was this translation helpful? Give feedback.
-
Copy that. Thanks for the information. |
Beta Was this translation helpful? Give feedback.
-
What happened:
Hi folks,
We are trying to run ClickHouse benchmark on juicefs (with OSS as the underlying object storage), and under the settings that juicefs has already cached the whole file to the local disk we notice a huge performance gap (compared with running the benchmark on Local SSD) when executing ClickHouse with 4 threads, but such degeneration doesn't happen if we limit the ClickHouse thread to 1.
More specifically, we are running the clickhouse benchmark with scale factor
1000
, and playing query 29th query (the involved tableReferer
sizes around24Gi
, the query is a full table scan operation), and given clickhouse100Gi
local SSD as the cache directory.After serveral runs to make sure the involved file are fully cached locally by juicefs, we notices the following performance numbers
You could see that the juicefs suffers much more performance degenerated when the workload executing in a multiple thread fashion. Is that behavour expected for juicefs?
Thanks!
What you expected to happen:
The performance gap shouldn't be such large for 4 thread settings.
How to reproduce it (as minimally and precisely as possible):
Playing the clickhouse benchmark inside a juicefs mounted directory.
Anything else we need to know?
Environment:
juicefs --version
) or Hadoop Java SDK version:juicefs version 1.0.0-beta2+2022-03-04T03:00:41Z.9e26080
cat /etc/os-release
): Ubuntu 20.04.3 LTSuname -a
):Linux mk1 5.4.0-100-generic #113-Ubuntu SMP Thu Feb 3 18:43:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Beta Was this translation helpful? Give feedback.
All reactions