Replies: 8 comments 5 replies
-
I believe there have been a lot of performance improvements in this area on master, but I'm not sure if they landed in the 1.4.1 release. Which version are you using? |
Beta Was this translation helpful? Give feedback.
-
Anyways I made another attempt with 1.4.0. I modified a bit the ozone config based on cloudera tips from this page https://docs.cloudera.com/storage/latest/storage-options/topics/ozone-performance-tuning-for-ozone.html and put 31g heap on all daemons. In addition, on each datanode I created a new volume on the OS drive (SSD) to host hdds.container.ratis.datanode.storage.dir in order to have all metadata store effectively backed by SSD. And the results are the same :
Here is the tweaked ozone-site.xml
And the process running :
|
Beta Was this translation helpful? Give feedback.
-
If you are confident the new version would solve this, could you provide more details ? I did not opened the source code, but from what I understood of the architecture it seems that there is room for a contention issue if we have a hot-spot container. How do you avoid that ? How do you avoid having one container loaded with very small files that will slow down the entire platform ? |
Beta Was this translation helpful? Give feedback.
-
Performance has increased after assembling and updating from master branch, I got the following results: Before the update:
After the update from master branch:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for sharing ! As you would expect I'm testing from a client perspective and from my perspective I observe on my smallest clusters a nominal load ~ 1000 request/s and that goes x3 under peak load. I tried to switch from version 1.4.0 to version 1.4.1 and I also observe a huge impact on performance. Error on one of the datanode:
whereas there are no errors on the master nodes. |
Beta Was this translation helpful? Give feedback.
-
I also tried with my old configuration without the performance tweaks and Ozone 1.4.1 also crash during the stress tests. It may not be clear from the post, but version 1.4.1 is worse than 1.4.0. 1.4.1 improved the performance on hybrid HDD/SSD datanodes, but makes it unreliable on full HDD datanodes. |
Beta Was this translation helpful? Give feedback.
-
I did not remove the option ozone.default.bucket.layout=FILE_SYSTEM_OPTIMIZED in my ozone-site.xml since it was stated in the doc that it is the default value. |
Beta Was this translation helpful? Give feedback.
-
请问问题解决了吗,这个版本是否稳定,是否可以作为生产试用? |
Beta Was this translation helpful? Give feedback.
-
Hello
I am evaluating ozone for s3 datalake on premises based on HDD.
I have a working HA setup using erasure coding.
The good news is that it works and that ozone did not crash under stress.
However, performance are not very impressive when compared to minio or ceph using the same 6 datanodes.
More specifically, in terms of Read and Write throughput for large objects it's OK but for small objects performance are bad even for an object store.
The worse being with my LIST benchmark, I was not even able to get results after several trials.
My test script uses minio-warp with the following settings :
The 4 last runs are LIST tests and they all fails with errors
My setup is as described in this graph :
I use 2 separates SSD drives for metadata on SCM and OM.
I use 1 separate HDD drive for ratis on datanode
ozone-site.xml as follows :
Any hints ?
Beta Was this translation helpful? Give feedback.
All reactions