CloudSpring | Cloud Slice: Amazon Glacier

Amazon Glacier is a storage service launched by Amazon last month. In this edition of Cloud Slice, we will go through basics of Amazon Glacier.

When to use Amazon Glacier?

Amazon Glacier is meant for data that you want to store but you don’t need to access frequently. You don’t want to loose the data, but at the same time it should be accessible to you reliably when you need. Moreover if slower speed of data retrieval is not a concern, then Amazon Glacier is a perfect choice for archiving your data.

Frequency of retrieval is also an important factor to consider. Amazon has capped frequency and the amount of data you can retrieve without any fee to a percentage of your data volume. In short the total cost is combination of storage cost and cost of retrieval over and above free retrieval quota.

How does Amazon Glacier work?

Before we go into details let’s understand the terminology used. At highest level, data is separated by “Vaults”. A vault can have archive files and archive itself might be a single large file or combination of multiple files. Security permissions and notifications can be set on vault.

Uploading and downloading data

Vaults can be created in AWS management console, but uploading and retrieving data can be done only through APIs. For upload larger than 100MB multi-part is suggested as a reliable operation. For retrieving the data, you make a request, which will create a retrieval job. The job is typically done in 3-5 hours, after which the data is available for download up to 24 hours. You can check the status of job using API or can be notified using notification. For moving large amounts of data, AWS Import/Export is a better option. AWS Import/Export bypasses internet and upload data directly to Amazon cloud and can be cost effective for large amounts of data.

Conclusion

Amazon Glacier is a reliable and secure storage service for data that needs to be archived and needs to be retrieved at leisure and rather infrequently. It frees you up from maintaining the servers and associated costs in security and backups.