Technology at ICPSR: Designing Storage Architectures for Digital Preservation

Thursday, September 24, 2009

Designing Storage Architectures for Digital Preservation - Day Two

[ Due to a combination of my own stupidity and the way in which Blogger does (or doesn't!) do auto-save, many of my notes for the second day disappeared sometime between DCA and DTW. So a very abbreviated set of notes for Day Two. ]

The first session, Data Integrity, began at 9:00am with a series of vendor presentations.

Henry Newman, Instumental: small market for digital preservation-quality systems. Disk density and transfer rate have outpaced reliability with disks. Still need tapes due to low power and high capacity and high reliability. HSM lacks broad market acceptance. Asserts that loss of a single bit is catastrophic. Mismatch between preservation requirements and main market requirements.
David Rosenthal, LOCKSS. Different usage patterns between content going into archives, and content retrieved on a regular basis. Designs and thinking need to take account of this.
Ray Clarke, Sun Microsystems. Draws distinction between backups v. archiving (preservation). Content growth exponential. Most data in archives is used infrequently. Asserts the tape "continues to make sense" for preservation: power, portability, etc. Humans introduce most errors, and so need to future-proof.
Mike Mott, IBM, spoke how some loss in some contexts is acceptable. Need to hit the "utility" number. Shared some stories from the past about needing to solve error detection and correction problems in the end-to-end system, not just within each component.
Tim Harder, EMC, High-Assurance and Integrity Layer. "Law of Large Numbers is not on your side." Described approach similar to LOCKSS and DuraCloud. Use sampling to validate correctness of data.
Paul Rutherford, Isilon, failures = disk drive, controller/node, human. Do not trust storage. Do not trust yourself. Need to recover from failures fast enough. Overall system must be available in face of failure in large components. "RAID is dead." Not good enough. "We called it 'grid' before 'cloud'."

Technology at ICPSR

Thursday, September 24, 2009

Designing Storage Architectures for Digital Preservation - Day Two

No comments:

Post a Comment