Skip to content

filanovskiy/catchSmallBlocks

Repository files navigation

To build the report run from edge/any hadoop node command:

$ sudo -u hdfs hadoop jar catchSmallBlocks.jar -minBlkSizeMB 32 -minBlkCount 10000 -path /

Where: -minBlkSizeMB is the threshould for the block size after this blocks are considering like small. 32MB is good point for start, don't try to start with large numbers, like 200MB - you will have big report, which is hard to analyze. -minBlkCount is number of small blocks. You may have files, which have few small blocks, it's not good, but not critical. -path / - HDFS subdirectory for analyzing

For merge files you may use Spark Job. You could check "MergeFilesExample" in this repo

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages