Implement backup file checksums #105

abg · 2014-10-31T18:51:27Z

There have been a few occasions where backup data has been corrupted after the fact, usually after shuffling through various long-term storage services. Sometimes those use deduplication features or other complex transformations that don't always interact well with all file formats. It is often difficult to tell where the corruption was introduced - whether holland did something wrong, or the source data was already bad or bit rot was introduced during shuffling data around.

To combat this, I would like to see checksums (optionally) generated after a successful backup so backup files can be easily verified later. This is relatively cheap to calculate even for large backups and can be disabled if problematic.

I propose one new option to support this feature:

[holland:backup]
checksum-algorithm = option(none, md5, sha1, sha256, ..., default="md5")

Where "none" disables the feature entirely. md5 is going to be the fastest and probably good enough here, but perhaps holland should default to one of the more robust sha variants.

This feature should write a CHECKSUMS file to the backup directory after successful backup by a plugin. Paths should be relative to the backup directory so one can simply verify via:

$ cd path
$ md5sum -c < CHECKSUMS

The text was updated successfully, but these errors were encountered:

m00dawg · 2014-10-31T19:06:04Z

That's an amazingly good idea! 🍻

Makes me also think a helper utility might be good for this, that just runs through the checksums and complains if they don't match. That could even help "auto-healing" file-systems like btrfs by forcing reads so they can validate the data as well. That doesn't need to be part of Holland Core though.

soulen3 · 2019-02-20T17:15:34Z

Checksums are implemented in the compression library of the following branch
https://github.com/holland-backup/holland/tree/checksum

# ls
CHECKSUM  MANIFEST.txt  mysql.sql.gz  test.sql.gz
# cat CHECKSUM
2216654173f6667ff0929b2a0a54491f MANIFEST.txt
cc85867750f22d7dde25fe00bd66f055 mysql.sql.gz
342812cf11b5cf1b39cb658f0a5ab7b7 test.sql.gz
# md5sum -c < CHECKSUM
MANIFEST.txt: OK
mysql.sql.gz: OK
test.sql.gz: OK

With the current implementation, it has to reread the file to create the checksum. I don't think this is ideal, so I'm trying to think of a better way to do this that isn't overly complicated.

abg added the enhancement label Oct 31, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement backup file checksums #105

Implement backup file checksums #105

abg commented Oct 31, 2014

m00dawg commented Oct 31, 2014

soulen3 commented Feb 20, 2019

Implement backup file checksums #105

Implement backup file checksums #105

Comments

abg commented Oct 31, 2014

m00dawg commented Oct 31, 2014

soulen3 commented Feb 20, 2019