Friday, October 22, 2010

TRAC: B2.2: Are our AIPs adequate to the job?

B2.2 Repository has a definition of each AIP (or class) that is adequate to fit long-term preservation needs.

In many cases, if the definitions required by B2.1 exist, this requirement is also satisfied, but it may also be necessary for the definitions to say something about the semantics or intended use of the AIPs if this could affect long-term preservation decisions. For example, say two repositories both only preserve digital still images, both using multi-image TIFF files as their preservation format. Repository 1 consists entirely of real-world photographic images intended for viewing by people and has a single definition covering all of its AIPs. (The definition may refer to a local or external definition of the TIFF format.) Repository 2 contains some images, such as medical x-rays, that are intended for computer analysis rather than viewing by the human eye, and other images that are like those in Repository 1. Repository 2 should perhaps define two classes of AIPs, even though it only uses one storage format for both. A future preservation action may depend on the intended use of the image—an action that changes the bit-depth of Trustworthy Repositories Audit and Certification: Criteria and Checklist the image in a way that is not perceivable to the human eye may be satisfactory for real-world photographs but not for medical images, for example.

Evidence: Documentation that relates the AIP component’s contents to the related preservation needs of the repository, with enough detail for the repository's providers and consumers to be confident that the significant properties of AIPs will be preserved.

This item is somewhat difficult to discuss without a crisp set of definitions for AIPs.  However, given the longevity of ICPSR as a digital repository, and given the track record (nearly fifty years) of preserving and delivering content, the empirical evidence would seem to indicate that the models and containers we are using for our content are a good fit for our long-term preservation needs.

In some ways ICPSR has a relatively easy task here since our content is pretty homogeneous (survey data and documentation) and we are able to normalize it into very durable formats like plain text and TIFF.  And because our content is all "born digital" and delivered digitally, there's fewer opportunities for things to go really awry.

We also create a great deal of descriptive metadata that we bundle with out content, and our content is highly curated compared to, say, an enormous stream of data coming from highway sensors or satellites.  In addition to making items easier to find and to use, it may also help keep them more durable as a side-effect.

As part of an NSF Interop/EAGER grant we're defining Fedora Objects for our most common content types, and for each object, we are also working through the specifications of the AIP.  My sense is that this will help us formalize some of our current practices, and will help illuminate any gaps where we should be collecting and saving metadata, but aren't today.  And that will help further inform the response to this TRAC item.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.