Skip to content

Latest commit

 

History

History
12 lines (7 loc) · 658 Bytes

readme.md

File metadata and controls

12 lines (7 loc) · 658 Bytes

To build the report run from edge/any hadoop node command:

$ sudo -u hdfs hadoop jar catchSmallBlocks.jar -minBlkSizeMB 32 -minBlkCount 10000 -path /

Where: -minBlkSizeMB is the threshould for the block size after this blocks are considering like small. 32MB is good point for start, don't try to start with large numbers, like 200MB - you will have big report, which is hard to analyze. -minBlkCount is number of small blocks. You may have files, which have few small blocks, it's not good, but not critical. -path / - HDFS subdirectory for analyzing

For merge files you may use Spark Job. You could check "MergeFilesExample" in this repo