You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There have been a few occasions where backup data has been corrupted after the fact, usually after shuffling through various long-term storage services. Sometimes those use deduplication features or other complex transformations that don't always interact well with all file formats. It is often difficult to tell where the corruption was introduced - whether holland did something wrong, or the source data was already bad or bit rot was introduced during shuffling data around.
To combat this, I would like to see checksums (optionally) generated after a successful backup so backup files can be easily verified later. This is relatively cheap to calculate even for large backups and can be disabled if problematic.
Where "none" disables the feature entirely. md5 is going to be the fastest and probably good enough here, but perhaps holland should default to one of the more robust sha variants.
This feature should write a CHECKSUMS file to the backup directory after successful backup by a plugin. Paths should be relative to the backup directory so one can simply verify via:
$ cd path
$ md5sum -c < CHECKSUMS
The text was updated successfully, but these errors were encountered:
Makes me also think a helper utility might be good for this, that just runs through the checksums and complains if they don't match. That could even help "auto-healing" file-systems like btrfs by forcing reads so they can validate the data as well. That doesn't need to be part of Holland Core though.
# ls
CHECKSUM MANIFEST.txt mysql.sql.gz test.sql.gz
# cat CHECKSUM
2216654173f6667ff0929b2a0a54491f MANIFEST.txt
cc85867750f22d7dde25fe00bd66f055 mysql.sql.gz
342812cf11b5cf1b39cb658f0a5ab7b7 test.sql.gz
# md5sum -c < CHECKSUM
MANIFEST.txt: OK
mysql.sql.gz: OK
test.sql.gz: OK
With the current implementation, it has to reread the file to create the checksum. I don't think this is ideal, so I'm trying to think of a better way to do this that isn't overly complicated.
There have been a few occasions where backup data has been corrupted after the fact, usually after shuffling through various long-term storage services. Sometimes those use deduplication features or other complex transformations that don't always interact well with all file formats. It is often difficult to tell where the corruption was introduced - whether holland did something wrong, or the source data was already bad or bit rot was introduced during shuffling data around.
To combat this, I would like to see checksums (optionally) generated after a successful backup so backup files can be easily verified later. This is relatively cheap to calculate even for large backups and can be disabled if problematic.
I propose one new option to support this feature:
Where "none" disables the feature entirely. md5 is going to be the fastest and probably good enough here, but perhaps holland should default to one of the more robust sha variants.
This feature should write a CHECKSUMS file to the backup directory after successful backup by a plugin. Paths should be relative to the backup directory so one can simply verify via:
The text was updated successfully, but these errors were encountered: