mt-aws-glacier

Perl Multithreaded multipart sync to Amazon Glacier service.

Intro

Amazon Glacier is an archive/backup service with very low storage price. However with some caveats in usage and archive retrieval prices. Read more about Amazon Glacier

mt-aws-glacier is a client application for Glacier.

Version

Version 0.82 beta (See ChangeLog)

Features

Does not use any existing Amazon Glacier library, so can be flexible in implementing advanced features
Glacier Multipart upload
Multithreaded upload
Multipart+Multithreaded upload
Multithreaded retrieval, deletion and download
Tracking of all uploaded files with a local journal file (opened for write in append mode only)
Checking integrity of local files using journal
Ability to limit number of archives to retrieve
File name and modification times are stored as Glacier metadata
Ability to re-create journal file from Amazon Glacier metadata
UTF-8 support

Coming-soon features

Multipart download (using HTTP Range header)
Use journal file as flock() mutex
Checking integrity of remote files
Upload from STDIN
Some integration with external world, ability to read SNS topics
Simplified distribution for Debian/RedHat
Split code to re-usable modules, publishing on CPAN (Currently there are great existing Glacier modules on CPAN - see Net::Amazon::Glacier by Tim Nordenfur)
Create/Delete vault functions

Planned next version features

Amazon S3 support

Important bugs/missed features

Zero length files are ignored
Only multipart upload implemented, no plain upload
No way to specify SNS topic
HTTP only, no way to configure HTTPS yet (however it works fine in HTTPS mode)

Production ready

Not recommended to use in production until first "Release" version. Currently Beta.

Installation/System requirements

Script is made for Linux OS. Tested under Ubuntu and Debian. Should work under other Linux distributions. Not tested under Mac OS X. Should NOT work under Windows.

Install the following CPAN modules:
- LWP::UserAgent (or Debian package libwww-perl or RPM package perl-libwww-perl)
- JSON::XS (or Debian package libjson-xs-perl or RPM package perl-JSON-XS)

Install mt-aws-glacier

  git clone https://github.com/vsespb/mt-aws-glacier.git

Warnings ( MUST READ )

When playing with Glacier make sure you will be able to delete all your archives, it's impossible to delete archive or non-empty vault in amazon console now. Also make sure you have read all Amazon Glacier pricing/faq.
Read their pricing FAQ again, really. Beware of retrieval fee.
With low "partsize" option you pay a bit more (Amazon charges for each upload request)
With high partsize*concurrency there is a risk of getting network timeouts HTTP 408/500.
Memory usage (for 'sync') formula is ~ min(NUMBER_OF_FILES_TO_SYNC, max-number-of-files) + partsize*concurrency
For backup created with older versions (0.7x) of mt-aws-glacier, Journal file required to restore backup.

Usage

Create a directory containing files to backup. Example /data/backup

Create config file, say, glacier.cfg

 key=YOURKEY
 secret=YOURSECRET
 # region: eu-west-1, us-east-1 etc
 region=us-east-1

Create a vault in specified region, using Amazon Console (myvault)
Choose a filename for the Journal, for example, journal.log

Sync your files

 ./mtglacier.pl sync --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log --concurrency=3

Add more files and sync again

Check that your local files not modified since last sync

 ./mtglacier.pl check-local-hash --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log

Delete some files from your backup location

Initiate archive restore job on Amazon side

 ./mtglacier.pl restore --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log --max-number-of-files=10

Wait 4+ hours for Amazon Glacier to complete archive retrieval

Download restored files back to backup location

./mtglacier.pl restore-completed --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log

Delete all your files from vault

./mtglacier.pl purge-vault --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log

Restoring journal

In case you lost your journal file, you can restore it from Amazon Glacier metadata

Run retrieve-inventory command. This will request Amazon Glacier to prepare vault inventory.
```
 ./mtglacier.pl retrieve-inventory --config=glacier.cfg --vault=myvault
```
Wait 4+ hours for Amazon Glacier to complete inventory retrieval (also note that you will get only ~24h old inventory..)
Download inventory and export it to new journal (this sometimes can be pretty slow even if inventory is small, wait a few minutes):
```
 ./mtglacier.pl download-inventory --config=glacier.cfg --vault=myvault --new-journal=new-journal.log
```

For files created by mt-aws-glacier version 0.8x and higher original filenames will be restored. For other files archive_id will be used as filename. See Amazon Glacier metadata format for mt-aws-glacier here: Amazon Glacier metadata format used by mt-aws glacier

Additional command line options

"concurrency" (with 'sync' command) - number of parallel upload streams to run. (default 4)
```
 --concurrency=4
```
"partsize" (with 'sync' command) - size of file chunk to upload at once, in Megabytes. (default 16)
```
 --partsize=16
```
"max-number-of-files" (with 'sync' or 'restore' commands) - limit number of files to sync/restore. Program will finish when reach this limit.
```
 --max-number-of-files=100
```

Test/Play with it

create empty dir MYDIR
Set vault name inside cycletest.sh

Run

 ./cycletest.sh init MYDIR
 ./cycletest.sh retrieve MYDIR
 ./cycletest.sh restore MYDIR

OR

	./cycletest.sh init MYDIR
	./cycletest.sh purge MYDIR

Help/contribute this project

If you are using it and like it, please "Star" it on GitHUb, this way you'll help promote the project
Please report any bugs or issues (using GitHub issues). Well, any feedback is welcomed.
If you want to contribute to the source code, please contact me first and describe what you want to do

Minimum Amazon Glacier permissions:

Something like this:

	{
	"Statement": [
		{
		"Effect": "Allow",
		"Resource":["arn:aws:glacier:eu-west-1:XXXXXXXXXXXX:vaults/test1",
			"arn:aws:glacier:us-east-1:XXXXXXXXXXXX:vaults/test1",
			"arn:aws:glacier:eu-west-1:XXXXXXXXXXXX:vaults/test2",
			"arn:aws:glacier:eu-west-1:XXXXXXXXXXXX:vaults/test3"],
			"Action":["glacier:UploadArchive",
				"glacier:InitiateMultipartUpload",
				"glacier:UploadMultipartPart",
				"glacier:UploadPart",
				"glacier:DeleteArchive",
				"glacier:ListParts",
				"glacier:InitiateJob",
				"glacier:ListJobs",
				"glacier:GetJobOutput",
				"glacier:ListMultipartUploads",
				"glacier:CompleteMultipartUpload"] 
		}
		]
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

mt-aws-glacier

Intro

Version

Features

Coming-soon features

Planned next version features

Important bugs/missed features

Production ready

Installation/System requirements

Warnings ( MUST READ )

Usage

Restoring journal

Additional command line options

Test/Play with it

Help/contribute this project

Minimum Amazon Glacier permissions:

EOF

Files

README.md

Latest commit

History

README.md

File metadata and controls

mt-aws-glacier

Intro

Version

Features

Coming-soon features

Planned next version features

Important bugs/missed features

Production ready

Installation/System requirements

Warnings ( MUST READ )

Usage

Restoring journal

Additional command line options

Test/Play with it

Help/contribute this project

Minimum Amazon Glacier permissions:

EOF