To build the report run from edge/any hadoop node command:

$ sudo -u hdfs hadoop jar catchSmallBlocks.jar -minBlkSizeMB 32 -minBlkCount 10000 -path /

Where: -minBlkSizeMB is the threshould for the block size after this blocks are considering like small. 32MB is good point for start, don't try to start with large numbers, like 200MB - you will have big report, which is hard to analyze. -minBlkCount is number of small blocks. You may have files, which have few small blocks, it's not good, but not critical. -path / - HDFS subdirectory for analyzing

For merge files you may use Spark Job. You could check "MergeFilesExample" in this repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Files

readme.md

Latest commit

History

readme.md

File metadata and controls