Technology at ICPSR: TRAC: B2.1: AIPs

B2.1 Repository has an identifiable, written definition for each AIP or class of information preserved by the repository.

An AIP contains these key components: the primary data object to be preserved, its supporting Representation Information (format and meaning of the format elements), and the various categories of Preservation Description Information (PDI) that also need to be associated with the primary data object: Fixity, Provenance, Context, and Reference. There should be a definition of how these categories of information are bound together and/or related in such a way that they can always be found and managed within the archive.

It is merely necessary that definitions exist for each AIP, or class of AIP if there are many instances of the same type. Repositories that store a wide variety of object types may need a specific definition for each AIP they hold, but it is expected that most repositories will establish class descriptions that apply to many AIPs. It must be possible to determine which definition applies to which AIP.

While this requirement is primarily concerned with issues of identifying and binding key components of the AIP, B2.2 places more stringent conditions on the content of the key components to ensure that they are fit for the intended purpose. Separating the two criteria is important, particularly if a repository does not satisfy one of them. It is important to know whether some or all AIPs are not defined, or that the definitions exist but are not adequate.

Evidence: Documentation identifying each class of AIP and describing how each is implemented within the repository. Implementations may, for example, involve some combination of files, databases, and/or documents.

Does anyone have written definitions for their AIPs?

I found a preliminary design document at the Library of Congress via a Google search that had a very long, very complete description of a proposed AIP for image-type content. But in general it seems hard to find real world examples of AIPs that are in use at working archives. Perhaps they are out there, but published in such a way that makes it difficult to discover them?

Here is my strawman stab at defining an AIP for the bulk of ICPSR's content: social science research data and documentation. This is very much a work-in-progress and should not be read as any sort of official document. Here goes:

Definition of an Archival Information Package (AIP) for a Social Science Study

We define an AIP for a social science study as a list of files where each file has supporting representation information in the form of:

a role (data, codebook, survey instrument, etc)
a format (we use MIME type)

and has the following Preservation Description Information:

Provenance. We link processed studies to initial deposits at aggregation-level, and we also collect processing history in our internal Study Tracking System which records who performed actions on the content, and major milestones in its lifecycle at ICPSR.
Context. We store related content together in the filesystem, and a good deal of the context embedded in both the name of each file and in a relational database. While not in production, we are evaluating the use of RDF/XML as a method for recording and exposing contextual information.
Reference. Each file has a unique ID.
Fixity. We use an MD5 hash at file-level to capture and check integrity.

So there's the strawman. To help guide my description of the PDI, I used these definitions from the Open Archival Information System (OAIS) specification:

– Provenance describes the source of the Content Information, who has had custody of it since its origination, and its history (including processing history).
– Context describes how the Content Information relates to other information outside the Information Package. For example, it would describe why the Content Information was produced, and it may include a description of how it relates to another Content Information object that is available.
– Reference provides one or more identifiers, or systems of identifiers, by which the Content Information may be uniquely identified. Examples include an ISBN number for a book, or a set of attributes that distinguish one instance of Content Information from another.
– Fixity provides a wrapper, or protective shield, that protects the Content Information from undocumented alteration. For example, it may involve a check sum over the Content Information of a digital Information Package

Technology at ICPSR

Friday, October 15, 2010

TRAC: B2.1: AIPs

No comments:

Post a Comment