Technology at ICPSR: TRAC: B2.11: AIP Verification

B2.11 Repository verifies each AIP for completeness and correctness at the point it is generated.

If the repository has a standard process to verify SIPs for either or both completeness and correctness and a demonstrably correct process for transforming SIPs into AIPs, then it simply needs to demonstrate that the initial checks were carried out successfully and that the transformation process was carried out without indicating errors. Repositories that must create unique processes for many of their AIPs will also need to generate unique methods for validating the completeness and correctness of AIPs. This may include performing tests of some sort on the content of the AIP that can be compared with tests on the SIP. Such tests might be simple (counting the number of records in a file, or performing some simple statistical measure such as calculating the brightness histogram of an original and preserved image), but they might be complex or contain some subjective elements.

Documentation should describe how completeness and correctness of SIPs and AIPs are ensured, starting with ensuring receipt from the producer and continuing through AIP creation and supporting long-term preservation. Example approaches include the use of checksums, testing that checksums are still correct at various points during ingest and preservation, logs that such checks have been made, and any special tests that may be required for a particular SIP/AIP instance or class.

Evidence: Description of the procedure that verifies completeness and correctness; logs of the procedure.

A few of my earlier posts have described the deposit system at ICPSR, and so with this post I would like to focus on the AIP. My sense is that the ICPSR package that is closest to the AIP is what ICPSR insiders would call "the turnover directory."

At least half of the ICPSR staff fall into a category called "data processors" or "data managers." These are the folks who take the deposits we receive and turn them into content that we can preserve and content that we deliver on our web site. They work in different teams, and are funded through a variety of mechanisms - membership dues, long-standing contracts with federal agencies, and even inter-agency agreements. Some of them work on a large number of collections each year, and others work on a very small number of collections. But all of them perform a series of work processes that end with a collection of content loaded into a single directory. This is the turnover directory.

At the point the content has been pulled into this single location, the data manager runs a tool which performs a broad variety of jobs, but which boils down into two essential tasks: conformance checking and ingest.

The conformance checking is at the heart of the TRAC requirement. This is where the content that we are about to ingest goes through a variety of checks; these checks implement (in software) a laundry list of business rules and requirements which are documented on ICPSR's intranet, and are managed by a committee.

In addition to the explicit data management checks, the system also records critical preservation metadata such as fixity, provenance, and context information.

Technology at ICPSR

Friday, February 4, 2011

TRAC: B2.11: AIP Verification

No comments:

Post a Comment