Hadoop FSImage Analyzer (HFSA) complements the Apache Hadoop 'hadoop-hdfs' tool by providing HDFS fsimage
- tooling support for summary overview of the HDFS data files and directories of users and groups (answering 'who has how many/big/small files...')
- a library for fast and partly multithreaded fsimage processing API file-, directory- and symlink aware visitor, derived from Apache HDFS FSImageLoader
- a helper FSImage file generator for creating synthetic test data
See FSImageLoaderTest.java for example usage.
The following lines visit all directory-, file- and symlink inodes:
RandomAccessFile file = new RandomAccessFile("src/test/resources/fsi_small.img", "r");
// Load file into memory
FsImageData fsimageData = new FsImageLoader.Builder()
.parallel().build()
.load(file);
// Traverse file hierarchy
new FsVisitor.Builder()
.parallel()
.visit(fsimageData, new FsVisitor() {
@Override
public void onFile(FsImageProto.INodeSection.INode inode, String path) {
// Do something
String fileName = ("/".equals(path) ? path : path + '/') + inode.getName().toStringUtf8();
System.out.println(fileName);
FsImageProto.INodeSection.INodeFile f = inode.getFile();
PermissionStatus p = loader.getPermissionStatus(f.getPermission());
...
}
@Override
public void onDirectory(FsImageProto.INodeSection.INode inode, String path) {
// Do something
final String dirName = ("/".equals(path) ? path : path + '/') + inode.getName().toStringUtf8();
System.out.println("Directory : " + fileName);
FsImageProto.INodeSection.INodeDirectory d = inode.getDirectory();
PermissionStatus p = loader.getPermissionStatus(d.getPermission());
...
}
@Override
public void onSymLink(FsImageProto.INodeSection.INode inode, String path) {
// Do something
}
}
);
- JDK 1.8 (11 recommended for running)
- Hadoop 2.x or 3.x fsimage
Note: hfsa lib version 1.2+ has Hadoop 3.x dependencies but still works for Hadoop 2.x fsimages - Maven 3.9.x (for building from source)
mvn clean install
- Configurable strategy for fast-but-memory-intensive or slow-but-memory-friendly fsimage loading
- Report and config options for topk/sorting/selection/...
HFSA is released under the Apache 2.0 license.
Copyright 2017-2023 Marcel May and project contributors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Contains work derived from Apache Hadoop.