Friday, September 30, 2011

Designing Storage Architectures for Digital Preservation

This is the second part of a two-part post about the 2011 Designing Storage Architectures for Digital Preservation meeting hosted by the Library of Congress.  The first part can be found in this post.


The second day began with a second session on Power-aware Storage Technologies.

Tim Murphy (SeaMicro) spoke about his lower-power server offering, noting that "Google spends 30% of its operating expenses on power" and how it "costs more to power equipment than to buy it."  Dale Wickizer (NetApp) gave a talk on how Big Data is now driving enterprise IT rather than enterprise applications or decision support.  Ethan Miller (Pure Storage) described his lower-power, flash-based storage hardware, and how a combination of de-dupe and compression makes it cost comparably to enterprise hard disk drives (HDD).  Dave Anderson (Seagate) spoke about HDD security and how new technology aimed at encryption may make sense for digital preservation applications too.

The theme of the next session was New Innovative Storage Technologies.

David Schissel (General Atomics) presented an overview of their enhanced version of the old Storage Resource Broker (SRB) technology which they call Nirvana.  Bob [did not catch his full name] (Nimbus Data) described his flash-based storage array, and how it applied the same techniques as conventional disk-based storage arrays, but with flash instead.  John Topp (Splunk) described his product which struck me as a giant indexer and aggregator of log file content.  Sam Thompson (IBM) spoke about BigSheets, which layers a spreadsheet metaphor on top of technologies like nutch, mapreduce, etc.

This theme continued into the next session.

Chad Thibodeau (Cleversafe) described his technology for authenticating to cloud storage in a more secure manner by distributing credentials across a series of systems.  Jacob Farmer (Cambridge Computer) proposed adding middleware between content management systems and raw storage to make it easier to manage and migrate content.  R B Hooks (Oracle) presented an overview of trends in storage technology, and noted that the consumer market, not IT, will drive flash technology.  Marshall Presser (Greenplum) spoke about I/O considerations in data analytics.

The day ended with two closing talks.

Ethan Miller (UC Santa Cruz this time) spoke about the need to conduct research into how archival storage systems are actually used.  He described results from a pair of initial studies.  In the first, access was nearly non-existent, except for a one-day period where Google crawled the storage, and this one day accounted for 70% of the access during the entire time period of the study.   In the second, 99% of the access was fixity checking.  [I think this is how ICPSR archival storage would look.] 

David Rosenthal (LOCKSS Project, Stanford University) presented a still evolving model of how one computes the long-term storage costs of digital preservation.  The idea is that this model could be used to answer questions about whether to buy or rent (cloud), when to upgrade technologies, and so on.  You can find the full description of the model at David's blog here.



No comments:

Post a Comment

Note: Only a member of this blog may post a comment.