Google+ Followers

Tuesday, November 23, 2010

The Cloud and Archival Storage

Price.  Availability.  Services.  Security.

These are the four parameters that I use when deciding where to store one of our archival storage copies.

For me the cloud is just another storage container.  Fundamentally it is no different from a physical storage location except in how it differs across these four dimensions.  In fact, I can conceptualize my "non-cloud" storage locations as storage as a service cloud providers, but where the provider is a lot more local than the big names in "cloud" today:

ICPSR Cloud:  This is the portion of the EMC NS-120 NAS that I use for a local copy of archival storage.  It is very expensive with a reasonably high-level of availability.  It provides very few services; if I want to perform a fixity check of the objects I have stored here, I have to create and schedule that myself.  Because I have physical control over the ICPSR Cloud, I have an irrational belief that it is probably secure, even though I know that ICPSR isn't as physically secure as many other companies at which I have worked.  Certainly ICPSR does not make any statements or guarantees about ISO 27001 compliance.

UMich Cloud:  This is a multi-TB chunk of NFS file storage that I rent from the University of Michigan's Information Technology Services (ITS) organization.  They call it ITS Value Storage.  The price here is excellent, but the level of availability is just a hair lower.  I don't notice the lower level of availability most of the time, but I do perceive it when running long-lived, I/O-intensive applications.  Like my own cloud, this one has no services unless I deploy them myself.  Because I do not have physical control over the equipment, or even know exactly where the equipment is (beyond a given data center), it feels like there is less control.  ITS makes no promises about ISO 27001 compliance (or promises about other standards), but my sense is that their controls and physical security and IT management processes must be at least as good as mine.  After all, they are managing many, many TBs for many different university departments and organizations, including themselves.

Amazon Cloud:  This is a multi-TB chunk of Elastic Block Storage (EBS) that I rent from Amazon Web Services.  I use EBS rather than the Simple Storage Service (S3) because I want the semantics of a filesystem so that I don't have to worry about things like files that are large or that have funny characters in their names.  The price here is good, better than my EMC NAS, but not as good as the ITS Value Storage.  The availability is quite good overall, but, of course, the network throughput between ICPSR and AWS is nowhere near as good as intra-campus networking, and it is even worse for the AWS EU location.  The services are no better and no worse than my own cloud or the UMich cloud.  Like the ITS Value Storage service I have no control over the physical systems, and I know even less about their physical location.  Amazon says that it passed a SAS 70 audit, and recently received an ISO 27001 certification.  This seems to be a better security story than anyone else so far.

DuraCloud:  Unlike the other clouds, I'm not using this one for archival storage; it is still in a pilot phase.  The availability is similar to plain old AWS (which hosts the main DuraCloud service), and the price is still under discussion.  My expectation is that the level of security is no better (and no worse) than the underlying cloud provider(s), and so depending upon which storage provider one selects, one's mileage may vary.  However, the really interesting thing about DuraCloud is the idea of the services.  If DuraCloud can execute and deliver useful, robust services on top of basic storage clouds, that will be a true value-add, and will make this a very compelling platform for archival storage.

Chronopolis:  Like DuraCloud, this too is not in production yet, and is being groomed (I think) as a future for-fee, production service.  I don't have as much visibility here with regard to availability since I am not actively moving content in and out of Chronopolis; most of the action seems to be taking please under the hood between the storage partner locations.  My sense is that the level of security is probably similar to the UMich Cloud since the lead organization, the San Diego Supercomputer Center, is in the world of higher education, like UMich, but it may well be the case that they have a stronger security story to tell, and I just don't know it.  And like DuraCloud, my sense is that it will come down to services:  If Chronopolis builds great services that facilitate archival storage, that will make it an interesting choice.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.