GitHub

To build the report run from edge/any hadoop node command:

$ sudo -u hdfs hadoop jar catchSmallBlocks.jar -minBlkSizeMB 32 -minBlkCount 10000 -path /

Where: -minBlkSizeMB is the threshould for the block size after this blocks are considering like small. 32MB is good point for start, don't try to start with large numbers, like 200MB - you will have big report, which is hard to analyze. -minBlkCount is number of small blocks. You may have files, which have few small blocks, it's not good, but not critical. -path / - HDFS subdirectory for analyzing

For merge files you may use Spark Job. You could check "MergeFilesExample" in this repo

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.settings		.settings
bin/catchSmallBlocks		bin/catchSmallBlocks
src/catchSmallBlocks		src/catchSmallBlocks
.classpath		.classpath
.gitattributes		.gitattributes
.gitignore		.gitignore
.project		.project
MergeFilesExample		MergeFilesExample
catchSmallBlocks.jar		catchSmallBlocks.jar
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

filanovskiy/catchSmallBlocks

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages