Perl Multithreaded multipart sync to Amazon Glacier service.
Amazon Glacier is an archive/backup service with very low storage price. However with some caveats in usage and archive retrieval prices. Read more about Amazon Glacier
mt-aws-glacier is a client application for Glacier.
- Version 0.82 beta (See ChangeLog)
- Does not use any existing Amazon Glacier library, so can be flexible in implementing advanced features
- Glacier Multipart upload
- Multithreaded upload
- Multipart+Multithreaded upload
- Multithreaded retrieval, deletion and download
- Tracking of all uploaded files with a local journal file (opened for write in append mode only)
- Checking integrity of local files using journal
- Ability to limit number of archives to retrieve
- File name and modification times are stored as Glacier metadata
- Ability to re-create journal file from Amazon Glacier metadata
- UTF-8 support
- Multipart download (using HTTP Range header)
- Use journal file as flock() mutex
- Checking integrity of remote files
- Upload from STDIN
- Some integration with external world, ability to read SNS topics
- Simplified distribution for Debian/RedHat
- Split code to re-usable modules, publishing on CPAN (Currently there are great existing Glacier modules on CPAN - see Net::Amazon::Glacier by Tim Nordenfur)
- Create/Delete vault functions
- Amazon S3 support
- Zero length files are ignored
- Only multipart upload implemented, no plain upload
- No way to specify SNS topic
- HTTP only, no way to configure HTTPS yet (however it works fine in HTTPS mode)
- Not recommended to use in production until first "Release" version. Currently Beta.
Script is made for Linux OS. Tested under Ubuntu and Debian. Should work under other Linux distributions. Not tested under Mac OS X. Should NOT work under Windows.
-
Install the following CPAN modules:
- LWP::UserAgent (or Debian package libwww-perl or RPM package perl-libwww-perl)
- JSON::XS (or Debian package libjson-xs-perl or RPM package perl-JSON-XS)
-
Install mt-aws-glacier
git clone https://github.com/vsespb/mt-aws-glacier.git
-
When playing with Glacier make sure you will be able to delete all your archives, it's impossible to delete archive or non-empty vault in amazon console now. Also make sure you have read all Amazon Glacier pricing/faq.
-
Read their pricing FAQ again, really. Beware of retrieval fee.
-
With low "partsize" option you pay a bit more (Amazon charges for each upload request)
-
With high partsize*concurrency there is a risk of getting network timeouts HTTP 408/500.
-
Memory usage (for 'sync') formula is ~ min(NUMBER_OF_FILES_TO_SYNC, max-number-of-files) + partsize*concurrency
-
For backup created with older versions (0.7x) of mt-aws-glacier, Journal file required to restore backup.
-
Create a directory containing files to backup. Example
/data/backup
-
Create config file, say, glacier.cfg
key=YOURKEY secret=YOURSECRET # region: eu-west-1, us-east-1 etc region=us-east-1
-
Create a vault in specified region, using Amazon Console (
myvault
) -
Choose a filename for the Journal, for example,
journal.log
-
Sync your files
./mtglacier.pl sync --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log --concurrency=3
-
Add more files and sync again
-
Check that your local files not modified since last sync
./mtglacier.pl check-local-hash --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log
-
Delete some files from your backup location
-
Initiate archive restore job on Amazon side
./mtglacier.pl restore --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log --max-number-of-files=10
-
Wait 4+ hours for Amazon Glacier to complete archive retrieval
-
Download restored files back to backup location
./mtglacier.pl restore-completed --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log
-
Delete all your files from vault
./mtglacier.pl purge-vault --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log
In case you lost your journal file, you can restore it from Amazon Glacier metadata
-
Run retrieve-inventory command. This will request Amazon Glacier to prepare vault inventory.
./mtglacier.pl retrieve-inventory --config=glacier.cfg --vault=myvault
-
Wait 4+ hours for Amazon Glacier to complete inventory retrieval (also note that you will get only ~24h old inventory..)
-
Download inventory and export it to new journal (this sometimes can be pretty slow even if inventory is small, wait a few minutes):
./mtglacier.pl download-inventory --config=glacier.cfg --vault=myvault --new-journal=new-journal.log
For files created by mt-aws-glacier version 0.8x and higher original filenames will be restored. For other files archive_id will be used as filename. See Amazon Glacier metadata format for mt-aws-glacier here: Amazon Glacier metadata format used by mt-aws glacier
-
"concurrency" (with 'sync' command) - number of parallel upload streams to run. (default 4)
--concurrency=4
-
"partsize" (with 'sync' command) - size of file chunk to upload at once, in Megabytes. (default 16)
--partsize=16
-
"max-number-of-files" (with 'sync' or 'restore' commands) - limit number of files to sync/restore. Program will finish when reach this limit.
--max-number-of-files=100
-
create empty dir MYDIR
-
Set vault name inside
cycletest.sh
-
Run
./cycletest.sh init MYDIR ./cycletest.sh retrieve MYDIR ./cycletest.sh restore MYDIR
OR
./cycletest.sh init MYDIR
./cycletest.sh purge MYDIR
- If you are using it and like it, please "Star" it on GitHUb, this way you'll help promote the project
- Please report any bugs or issues (using GitHub issues). Well, any feedback is welcomed.
- If you want to contribute to the source code, please contact me first and describe what you want to do
Something like this:
{
"Statement": [
{
"Effect": "Allow",
"Resource":["arn:aws:glacier:eu-west-1:XXXXXXXXXXXX:vaults/test1",
"arn:aws:glacier:us-east-1:XXXXXXXXXXXX:vaults/test1",
"arn:aws:glacier:eu-west-1:XXXXXXXXXXXX:vaults/test2",
"arn:aws:glacier:eu-west-1:XXXXXXXXXXXX:vaults/test3"],
"Action":["glacier:UploadArchive",
"glacier:InitiateMultipartUpload",
"glacier:UploadMultipartPart",
"glacier:UploadPart",
"glacier:DeleteArchive",
"glacier:ListParts",
"glacier:InitiateJob",
"glacier:ListJobs",
"glacier:GetJobOutput",
"glacier:ListMultipartUploads",
"glacier:CompleteMultipartUpload"]
}
]
}