Wednesday, September 28, 2011

Designing Storage Architectures for Digital Preservation

I attended the 2011 edition of the Library of Congress' Designing Storage Architectures for Digital Preservation meeting (link to 2010 meeting).  Like previous events, this meeting was scheduled over two days, and featured attendees and speakers from industry, higher education, and the US government.  This post will summarize the first day of the meeting, and I'll post a summary of the second day later this week.

The meeting was held in the ballroom of The Fairfax on Embassy Row in Washington, DC.  About 100 people attended the event which began at noon on Monday, September 26, 2011.  As at past meetings the first hour was devoted to registration and a buffet lunch.

The program began at 1:00pm with a welcome from Martha Anderson, who leads the National Digital Information Infrastructure and Preservation Program (NDIIPP) program for the Library of Congress (LC).  She noted that since its inception in 2000, the program has funded 70 projects spread across 200 organizations, and that it is valuable for people to be able to step out of the office for a short time to step back, see the big picture in digital preservation, and get fresh perspectives.  She described how change is the driving force in digital preservation, and characterized one big change as a shift from indexing content to processing content.

Two "stage setting" presentations followed.

Carl Watts (LC) described a massive migration underway at the LC where 500TB of content was moving from one storage platform to another.  Henry Newman (Instrumental) described challenges facing the digital preservation community:  data growth is greatly exceeding growth in hardware speed and capacity; POSIX has not changed in many years; nomenclature is not used consistently between digital preservation practitioners and vendors; and, the total cost of ownership for digital preservation is not well understood.

The theme of the first session was Case Studies from Community Storage Users and Providers.

Scott Rife (LC) described the video processing routine used at the LC Packard Campus, which handles over 1m videos and 7m audio files, 7TB/day of content, and 2GB/s of disk access.  Jim Snyder (LC) also spoke about the Packard Campus, noting that he is trying to "engineer for centuries" where one generation hands off to the next generation.  Cory Snavely (University of Michigan) and Trevor Owens (LC) gave an overview of the National Digital Stewardship Alliance (NDSA), and a more detailed report of what has been happening in the Infrastructure Working Group [note - I am a member of that WG] including preliminary results from a survey of members.  Highlights: 87% of respondents intend to keep content indefinitely; 76% anticipate a infrastructure change within three years; 72% want to host content themselves; 50% want to outsource hosting (!); 57% are using or considering use of "the cloud"; and, 60% intend to work through the TRAC process. 

Steve Abrams (California Digital Library) spoke about a "neighborhood watch" metaphor for assessing digital preservation success.  Tab Butler (Major League Baseball Network) updated the audience on the staggering amount of video he manages (2500 hours of HD video each week with multiple copies/versions of many of the hours).  Barbara Taranto (New York Public Library) described a migration where the content doesn't move; only its address changes (in a Fedora repository).  Corey Snavey (Hathitrust this time) gave a second talk, updating the audience on text searching at the Hathitrust.  More memory delivers better performance.  Andrew Woods (DuraSpace) described some of the challenges his team has faced building storage services across disparate cloud storage providers. 

The theme of the next session was Power-aware Storage Technologies.

Hal Woods (HP) forecast a shift to solid state drives (SSD) in the next 2-4 years, and speculated that tape might outlive hard disk drives (HDD).  Bob Fernander (Pivot3) described video as the "new baseline" for content, and warned that we need to stop building Heath Kit style solutions to problems.  Dave Fellinger (DataDirect Networks) advised that the building blocks of digital preservation solutions needed to be bigger, and building with the right-sized block would make it easier to solve problems.  Mark Flournoy (STEC) gave a very nice overview of different SDD market segments, costs, and performance metrics.

Each session included a lengthy question, answer, and comment section, and sometimes lively debate amongst the audience. 

The first day wrapped up a bit after 5pm.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.