Wednesday, 11 April 2018

Archiving data to AWS Glacier

Archiving data to AWS Glacier

For archiving your data at AWS you can use AWS Glacier. This service offers cheaper storage than the S3 service but the downside is that your data won’t be accessible right away as it is with S3. It may take a few hours to restore the data but if you are using this service for real archiving purposes this shouldn’t be a real issue. Lets setup a Glacier Vault by using the Management Console. Select the Glacier service in the Management Console:
Screen Shot 2013-05-14 at 14.11.08
Click ‘Create Vault’ and give it a name:
Screen Shot 2013-05-14 at 14.11.40
Set up a new SNS topic so we can receive messages when archiving actions are finished.
Screen Shot 2013-05-14 at 14.12.44
If all went well the Vault is created:
Screen Shot 2013-05-14 at 14.12.58
Together with the vault also a new SNS is created. To receive the messages posted on the SNS lets subscribe to it. Select the SNS service in the console:
Screen Shot 2013-05-14 at 14.25.22
Select the SNS topic that is just created:
Screen Shot 2013-05-14 at 14.26.25
Now click on the button to create a subscription to this topic. I simply create an email subscription so every message posted on the topic is send to the supplied email address:
Screen Shot 2013-05-14 at 14.58.47
The Glacier service in the Management Console doesn’t come with the functionality to archive/restore files to and from Glacier (like it does with S3). But there is of course an API to use and several SDK’s to transfer files. And of course there is the community which jumped into this and created a lot of tools both GUI and CLI based. I just picked one of them and installed it.
After installing and configuring the Glacier-cli I can check the created vault is found:
pascal$ ./glacier.py --region eu-west-1 vault list
PascalBackuPVault
Lets put some data into the vault. I use the example of the CLI tool as inspiration. Create a local file:
pascal$ echo 42 > example.txt
Upload the file into the vault created earlier:
pascal$ ./glacier.py --region eu-west-1 archive upload PascalBackuPVault example.txt
Lets check the file is in the vault:
pascal$ ./glacier.py --region eu-west-1 archive list PascalBackuPVault
example.txt
Now remove the local file and restore the archived one:
pascal$ rm example.txt
pascal$ ./glacier.py --region eu-west-1 archive retrieve PascalBackuPVault example.txt
glacier: queued retrieval job for archive 'example.txt'
pascal$ ./glacier.py --region eu-west-1 archive retrieve PascalBackuPVault example.txt
glacier: job still pending for archive 'example.txt'
pascal$ ./glacier.py --region eu-west-1 job list
a/p 2013-05-20T18:40:25.107Z PascalBackuPVault example.txt
pascal$ ./glacier.py --region eu-west-1 archive retrieve --wait PascalBackuPVault example.txt
After a few hours the job is finished and we can access the local file ‘example.txt’ again.
pascal$ cat example.txt
42
And in the mail I find the following notification telling me the archive file is ready for retrieval:
{"Action":"ArchiveRetrieval"
,"ArchiveId":"CggVcXvaWKfRn5tDR_UKna0GsYyXyZzlALPvjEFkcLdRq4NRBXra36m7hBOJSNCbOmEkQ04VoyTQyMt_pXdrNggms13e3vjUqwW3tZwps8BiA1gprQQZyUQPDwwWkuKAFZoqahzA-g"
,"ArchiveSHA256TreeHash":"084c799cd551dd332d5c5f9a5d593b2e931f5e36122ee5c793c1d08a19839cc0"
,"ArchiveSizeInBytes":3
,"Completed":true
,"CompletionDate":"2013-05-20T22:40:31.040Z"
,"CreationDate":"2013-05-20T18:40:25.107Z"
,"InventorySizeInBytes":null
,"JobDescription":null
,"JobId":"rxRUKT0QVWyOEMu4VYW_zrhXXYZC0ZrVo63sCtQJDBpFyhO-pPRJ7Z_Af02Hvn-bge-yGrKzRw78xG9d-Nvxjv2LcQho"
,"RetrievalByteRange":"0-2"
,"SHA256TreeHash":"084c799cd551dd1d6e535f9a5d593b2e931f5e36122ee5c793c1d08a19839cc0"
,"SNSTopic":null
,"StatusCode":"Succeeded"
,"StatusMessage":"Succeeded"
,"VaultARN":"arn:aws:glacier:eu-west-1:678658091597:vaults/PascalBackuPVault"}
--
...
There is of course a lot more to tell about AWS Glacier. This page will be a good next step to get familiair with Glacier.