Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement backup file checksums #105

Open
abg opened this issue Oct 31, 2014 · 2 comments
Open

Implement backup file checksums #105

abg opened this issue Oct 31, 2014 · 2 comments

Comments

@abg
Copy link
Contributor

abg commented Oct 31, 2014

There have been a few occasions where backup data has been corrupted after the fact, usually after shuffling through various long-term storage services. Sometimes those use deduplication features or other complex transformations that don't always interact well with all file formats. It is often difficult to tell where the corruption was introduced - whether holland did something wrong, or the source data was already bad or bit rot was introduced during shuffling data around.

To combat this, I would like to see checksums (optionally) generated after a successful backup so backup files can be easily verified later. This is relatively cheap to calculate even for large backups and can be disabled if problematic.

I propose one new option to support this feature:

[holland:backup]
checksum-algorithm = option(none, md5, sha1, sha256, ..., default="md5")

Where "none" disables the feature entirely. md5 is going to be the fastest and probably good enough here, but perhaps holland should default to one of the more robust sha variants.

This feature should write a CHECKSUMS file to the backup directory after successful backup by a plugin. Paths should be relative to the backup directory so one can simply verify via:

$ cd path
$ md5sum -c < CHECKSUMS
@m00dawg
Copy link
Contributor

m00dawg commented Oct 31, 2014

That's an amazingly good idea! 🍻

Makes me also think a helper utility might be good for this, that just runs through the checksums and complains if they don't match. That could even help "auto-healing" file-systems like btrfs by forcing reads so they can validate the data as well. That doesn't need to be part of Holland Core though.

@soulen3
Copy link
Contributor

soulen3 commented Feb 20, 2019

Checksums are implemented in the compression library of the following branch
https://github.com/holland-backup/holland/tree/checksum

# ls
CHECKSUM  MANIFEST.txt  mysql.sql.gz  test.sql.gz
# cat CHECKSUM
2216654173f6667ff0929b2a0a54491f MANIFEST.txt
cc85867750f22d7dde25fe00bd66f055 mysql.sql.gz
342812cf11b5cf1b39cb658f0a5ab7b7 test.sql.gz
# md5sum -c < CHECKSUM
MANIFEST.txt: OK
mysql.sql.gz: OK
test.sql.gz: OK

With the current implementation, it has to reread the file to create the checksum. I don't think this is ideal, so I'm trying to think of a better way to do this that isn't overly complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants