Monday, October 5, 2009
I attended a webinar on DuraSpace last Wednesday. As a big fan of "the cloud" I was very interested to hear about what's been built, how it could be used, and a roadmap of the future. I learned a little bit about all three topics on the webinar.
Gina Jones from the Library of Congress hosted the meeting, and the main speaker was Michele Kimpton.
DuraCloud is being built as an OSGi container sitting on top of cloud storage providers. Customers can view DuraCloud as a buying club for lower prices, and for easing the burden of learning the administrative and software interfaces of each cloud provider.
DuraCloud is starting a pilot project with four cloud providers: (1) Amazon, (2) EMC, (3) Rackspace, and (4) Sun. They are also working actively to add Microsoft as a fifth cloud provider. They have two content providers signed up for the pilot: the New York Public Library, and the Biodiversity Heritage Library.
The NYPL has 800k objects and 50TB of content. They'd like to use DuraSpace to make a copy of their materials, and to transform content from TIFF format to JPEG2000. The JPEG2000 images would then be pulled back out of the cloud to local storage at the NYPL.
The BHL has 40TB of content, and is hoping to use DuraCloud to distribute its content across multiple locations (US, EU), and as a platform for hosting computational intensive data mining.
The pilot is running through the end of the calendar year, and DuraSpace intends to have a pricing model in place by Q2 2010, and to launch a production service in Q3.
In response to a question from a participant, Michele indicated that the focus was NOT on securing sensitive data, but rather on hosting public data with open access. So DuraCloud might be a good bet for some of the content ICPSR delivers on its web site, for example, but not for medical records, confidential data, etc.