Google+ Followers

Friday, May 20, 2011

TRAC: B4.2: Storage and migration strategies

B4.2 Repository implements/responds to strategies for archival object (i.e., AIP) storage and migration.

At least two aspects of the strategy must be acted upon: that which pertains to how AIPs are currently stored (including physical requirements, media requirements, location of copies, formats and metadata) and that which may require AIP migration of any form. For example, AIP migrations that result in transformations of content need to be tracked to allow subsequent users to understand the repository’s processing implications.

If a repository has not yet needed to carry out any sort of preservation strategy on AIP(s), it must demonstrate that its policy has not required it yet.

Evidence: Institutional technology and standards watch; demonstration of objects on which a preservation strategy has been performed; demonstration of appropriate preservation metadata for digital objects.



Perhaps the biggest AIP migration so far at ICPSR has been the move from magnetic tape as the storage media to "spinning disk."  Here's part of the story.

In 2005 ICPSR leased and managed three off-site storage locations.  Two of the locations were small, and contained older magnetic tape formats, such as IBM 3480 cartridge.  One of the locations was quite large ("the warehouse"), and that location had an assortment of tapes (IBM cartridge, 9-track, and more modern DLT) and paper.  The paper was a mix of old copies of content ICPSR used to distribute via post (e.g., codebooks from the 1980s and earlier) and backup material related to the born-digital content (e.g.,  letters from researchers about their datasets).


A member of my team (Asmat Noori, who manages IT operations) lead an effort with four goals:
  1. Move digital content from tape to disk
  2. Discard old distribution paper content (the old codebooks)
  3. Transfer archival paper to Iron Mountain for safe-keeping
  4. Close the three off-site locations
Asmat's team consists of a handful of full-time ICPSR staff, a few student temps, and part of a software developer who would build tracking systems, matching boxes of content to Iron Mountain locations, and matching old media to new.

The team worked through two projects simultaneously.  The first was to clear out all of the superfluous paper from the warehouse location.  There were an enormous quantity of paper to discard, and we worked with a local company to recycle as much as we could.  The second was to retrieve the magnetic tapes in batches so that they could be copied to disk, verified, and then discarded.  For each tape we captured its table of contents, and the content on the tape.  We performed essential sanity-checking on the restored content, and for each file, recorded the source (e.g., file X came from tape Y).  The entire process took a bit under two years.

After moving the digital content to disk, we started two new activities:  weekly fixity checks of each item, and daily copying of content to remote locations.  These new activities gave us more copies, in more locations across the United States, with greater confidence that each copy is in good order.

The printed material still gets some use, but not that often.  Storing the material with Iron Mountain, and paying for occasional retrieval, costs much, much less than operating a warehouse and paying a staff member (even a part-time temp) to move content between a warehouse and ICPSR.






 And there's the evidence of implementing a storage and migration strategy.

No comments:

Post a Comment