News and commentary about new technology-related projects under development at ICPSR
Wednesday, September 29, 2010
Designing Storage Architectures for Digital Preservation - Day One, Part One
I attended an event on Monday and Tuesday of last week that was hosted by the Library of Congress: Designing Storage Architectures for Digital Preservation. I also attended the event last year, and so this was my second time attending.
Like last time there were many speakers, each giving a five minute presentation. Unlike a TED talk where the presentation materials are built specifically to fit well within five minutes, many speakers had conventional slide decks, and raced through them quickly. Those tended to be the weaker talks since the scope of the material was far too broad for the time allotted. After a series of presentations there would be group discussion for 15-30 minutes which ran the gamut from interesting and provocative observations to chasing down rabbit holes.
I know the LoC will post complete information about the event, but here is my abbreviated version. I've tried to hit what I considered to be the highlights, and so the reader should know that this report isn't complete.
The session opened with a video that argued that the Internet gives us more opportunity to innovate since it lowers the barrier for one's "hunches" to "collide" with those of another, and that innovation occurs when two or more good ideas come together. Henry Newman then gave a framing overview for the meeting that included these interesting points.
IT components are changing/improving at different rates; for example, processors are getting faster more quickly than buses are getting faster
The preservation community and the IT community use different language to talk about archival storage
Preservation TCO is not well understood
The consumer market is driving the storage industry, not the enterprise market
The first of two sessions featured "heavy users" who spoke about some the challenges they faced. The speakers included Ian Soborhoff (NIST), Mark Phillips (University of North Texas), Andy Maltz (Academy of Motion Picture Arts and Sciences), Ethan Miller (University of California - Santa Cruz), Arcot "Raja" Rajasekar (San Diego Supercomputer Center), Tom Garnett (Biodiversity Heritage Library), Barbara Taranto (New York Public Library), Martin Kalfatovic (Smithsonian Institution), and Tab Butler (Major League Baseball Network). Highlights of their presentations and the follow-on discussion:
Experienced recent sea change where it was no longer possible to forecast storage needs whatsoever
"Archival storage... whatever that is."
Pergamum tome technology looks very interesting for smart, low-power storage
iRODS main components: data server cloud, metadata catalog, and the rule engine
"Open access is a form of preservation."
If one needs N amount of space for one copy of archival storage, one also needs 2 x N or 3 x N for the ingest process
The "long now"
The MLB Network data archive will consume 9000 LTO-4 tapes for storage in 2010.
"Digital preservation sounds like hoarding."
"After our content was indexed by Google, usage went up 10x."
Data recovery from corrupted files is a digital preservation concern.
Forensics of a format migration is an effective tool for finding problems in a repository.