Friday, February 18, 2011

TRAC: B2.12: Providing an audit capability

B2.12 Repository provides an independent mechanism for audit of the integrity of the repository collection/content.

In general, it is likely that a repository that meets all the previous criteria will satisfy this one without needing to demonstrate anything more. As a separate requirement, it demonstrates the importance of being able to audit the integrity of the collection as a whole.

For example, if a repository claims to have all e-mail sent or received by The Yoyodyne Corporation between 1985 and 2005, it has been required to show that:
  • The content it holds came from Yoyodyne’s e-mail servers.
  • It is all correctly transformed into a preservation format.
  • Each monthly SIP of e-mail has been correctly preserved, including original unique identifiers such as Message-IDs.

However it may still have no way of showing whether this really represents all of Yoyodyne’s email. For example, if there is a three-day period with no messages in the repository, is this because Yoyodyne was shut down for those three days, or was the e-mail lost before the SIP was constructed? This case could be resolved by the repository amending its description of the collection, but other cases may not be so straightforward.

A familiar mechanism from the world of traditional materials in libraries and archives is an accessions or acquisitions register that is independent of other catalog metadata. A repository should be able to show, for each item in its accessions register, which AIP(s) contain content from that item. Alternatively, it may need to show that there is no AIP for an item, either because ingest is still in progress, or because the item was rejected for some reason. Conversely, any AIP should be able to be related to an entry in the acquisitions register.

Evidence: Documentation provided for B2.1 through B2.6; documented agreements negotiated between the producer and the repository (see B 1.1-B1.9); logs of material received and associated action (receipt, action, etc.) dates; logs of periodic checks.

ICPSR meets this requirement by maintaining as accession register which is a very long (and always growing) list of the files that we preserve.  A weekly automated job uses this list as input, and checks to see if each item is still available in archival storage, and also checks to see if the item is intact (i.e., its digital signature has not changed).

