A command line tool to create metalink (RFC5854) files (.meta4
/.metalink
) from a directory structure on local storage, complete with multiple cryptographic hash options. The latest binary release for Windows x86 and x64 can be found here.
This tool, dir2ml
, will create a single UTF-8 metalink file from a supplied directory and base URL, with one file
record per file, sorted case-sensitive ASCIIbetically by path to simplify diffs. The metalink file is an XML format so it can be rendered in a variety of ways using ordinary tools such as XML transformation utilities operating on XML stylesheets (.xslt
). See an example of this file format below.
Later, when the original storage location of the files goes offline, the .meta4
file can be used to identify and locate those files by hash on other servers or by P2P, and to reconstruct the original file structures.
Try it! Run dir2ml.exe
on your own computer's download directory, open the .meta4
file it generates, and search on your favorite search engine for some of the hashes. You might have the best luck with MD5 hashes of .tar.gz
files but as metalinks become more ubiquitous, it will become easier to locate other lost files.
Another use-case is to use dir2ml
to periodically fingerprint your hard drive onto a USB flash drive (I recommend you use --sparse-output --file-url
for that purpose), then if your hard drive starts to crash or if you're hit by ransomware, you can use any ordinary diff tool to compare two .meta4
files and easily determine which files have changed and need to be restored from backup. (You have a backup, right?)
Also included are a schema file (metalink4.xsd
copied from here) and a stylesheet (dfxml2meta4.xslt
) that converts from a Digital Forensics XML file to .meta4
.
-
Clone this repository:
git clone --recursive https://github.com/hofmand/metalink-builder.git
or use your web browser to download the
.zip
file, then unpack it to the directory of your choice. -
Compile
metalink-builder\src\metalink-tools.sln
-
dir2ml.exe
will be located inmetalink-builder\src\x64\Release
ormetalink-builder\src\x64\Debug
dir2ml --help
dir2ml -d
path -o
outfile { -f
| --file-url
| -u
| --base-url
| --ni-url
}
dir2ml -d ./MyMirror -u ftp://ftp.example.com -o MyMirror.meta4
-d
, --directory
directory-path - The directory path to process
-o
, --output
outfile - Output filename (.meta4
or .metalink
)
At least one of -u
/--base-url
, -f
/--file-url
, or --ni-url
must be supplied.
-c
, --country
country-code - ISO3166-1 alpha-2 two letter country code of the server specified by base-url above
-h
, --help
- Show this screen
--version
- Show version information
-s
, --show-statistics
- Show statistics at the end of processing
-u
, --base-url
base-url - The base/root URL of an online directory containing the files. For example, ftp://ftp.example.com
. dir2ml
will append the relative path of each file to the base-url.
-f
, --file-url
- Add a local source for the file, using the directory specified by --directory
prepended by file://
. This is useful for fingerprinting a directory or hard drive.
Note: on Windows, backslashes (\
) in the base-url will be replaced by forward slashes (/
).
-v
, --verbose
- Verbose output to stdout
--hash-type
hash-list - Calculate and output all of the hashes specified by hash-list (comma-separated). Available hash functions are md5
, sha1
, sha256
, and all
. If none are specified, sha256
is used.
--find-duplicates
- Add URLs from all duplicate files to each matching metalink file
node. This preserves the directory structure (as does having neither --find-duplicates
nor --consolidate-duplicates
flags turned on) and allows the directory structure to be rebuilt using identical files from directories other than the original directory, but results in large .meta4
file sizes and takes more time than --consolidate-duplicates
. Use --ignore-file-dates
in conjunction with --find-duplicates
to reduce the .meta4
file size if preserving file dates is not important.
--consolidate-duplicates
- Add duplicate URLs from all duplicate files to the first matching metalink file
node and remove the other matching file
nodes. This does not perfectly preserve the directory structure. Running dir2ml
with this flag turned on takes more time than without the flag but results in the smallest possible .meta4
file sizes. Add --ignore-file-dates
to consolidate duplicate files further by ignoring file modification times.
--ignore-file-dates
- Ignore file "last modified" dates and times when finding or consolidating duplicates. This will not preserve file timestamps. Turn this flag on when you care more about keeping the .meta4
file small than about preserving file dates. Requires --find-duplicates
or --consolidate-duplicates
.
--ignore-file-times
- Ignore file "last modified" times (but respect dates) when finding or consolidating duplicates. Requires --find-duplicates
or --consolidate-duplicates
.
--ni-url
- Output Named Information (RFC6920) links (experimental). Requires --hash-type sha256
<?xml version="1.0" encoding="UTF-8"?>
<metalink xmlns="urn:ietf:params:xml:ns:metalink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="metalink4.xsd">
<generator>dir2ml/0.1.0</generator>
<published>2010-05-01T12:15:02Z</published>
<file name="example.ext">
<size>14471447</size>
<hash type="sha-256">17bfc4a6058d2d7d82db859c8b0528c6ab48d832fed620ed49fb3385dbf1684d</hash>
<url location="us" type="ftp">ftp://ftp.example.com/example.ext</url>
</file>
<file name="subdir/example2.ext">
<size>14471447</size>
<hash type="sha-256">f44bcce2a9c2aa3f73ddc853ad98f87cd8e7cee5b5c18719ebb220da3fd4dbc9</hash>
<url location="us" type="ftp">ftp://ftp.example.com/subdir/example2.ext</url>
</file>
</metalink>
- Windows only but the code uses only standard C/C++ (no MFC/.NET, etc.) so porting to other operating systems should not be a major task.
- Single threaded so
dir2ml
is CPU-bound, especially when the storage device is fast. dir2ml.exe
may run out of memory when processing a directory containing millions of files because the XML file isn't written until the very end. If you run into this problem, please open an issue.- Only one base-url can be specified.
- Only MD5, SHA-1, and SHA-256 hashes are currently supported. (Do we need any others?)
- Port the code to Linux/BSD
- Support WARC files (
warc2meta
?) - Generate additional metadata from Image::ExifTool, uchardet, etc.
- Add
.torrent
files to<metaurl>
subnodes - Filter by file size, type, etc.
- Provide a means to merge and split
.meta4
files - Multithreaded hashing
- Support multiple
country-code
/base-url
pairs - Deep-inspect archive files? (
.zip
,.iso
, etc.) - Output the
.meta4
file directly to a.zip
/.rar
/.7z
container, in order to save storage space - Convert QuickHash output files to metalink files if possible (unless the developer decides to output metalink files directly)
- Add xxHash and/or FarmHash algorithms
- Import from SFV, md5sum, and sha1sum formats.
- Provide a means to verify hashes.
- Wikipedia: Comparison of file verification software
- Corz Checksum - a Windows file hashing application (call
checksum.exe crs1
directory-path to get similar output todir2ml.exe --file-url --hash-type sha1 --directory
directory-path--output
outfile)- NB:
checksum.exe
processes files before subdirectories.
- NB:
- Hash Archive - a database of file hashes (Linux
.iso
files, etc.). - HashDeep - a hashing utility that can output to Digital Forensics XML format (call
hashdeep64 -r -j0 -c sha256 -d
directory-path>
outfile to get similar output todir2ml.exe --file-url --hash-type sha256 --directory
directory-path--output
outfile)- NB:
hashdeep
sorts the output in a non-trivial manner.
- NB:
- HashMyFiles - an application similar to
dir2ml
. - niemandsland - named information (NI, RFC6920) exchange.
- OpenTimestamps - a service to store hashes in the blockchain.
- Redump.org - a database of hashes of computer/console game dumps.
- RHash - another hashing utility.
- Reddit comment: Best way to archive a website.
- Microsoft Visual Studio 2015; targeting x64, Unicode
Derek Hofmann
This project is licensed under the MIT License - see the LICENSE file for details.