The repository must develop or adapt appropriate measures for ensuring the integrity of its holdings. The mechanisms to measure integrity will evolve as technology evolves, but currently include examples such as the use of checksums at ingest and throughout the preservation process. The chain of custody for all of its digital content from the point of deposit forward must be explicit, complete, correct, and current. The repository must demonstrate that the content it has matches the content it received, e.g., with an implemented registry function that documents content from submission onward. Losses associated with migration and other preservation actions should also be documented and made available to relevant stakeholders. (See C1.5 and C1.6.)
If protocols, rules, and mechanisms are embedded in the repository software, there should be some way to demonstrate the implementation of integrity measurements.
Evidence: An implemented registry system; a definition of the repository’s integrity measurements; documentation of the procedures and mechanisms for integrity measurements; an audit system for collecting, tracking, and presenting integrity measurements; procedures for responding to results of integrity measurements that indicate digital content is at risk; policy and workflow documentation.
ICPSR operates very differently than a conventional archive, and it really shows when one looks at this TRAC requirement.
A typical workflow for us looks like this:
- Receive some content in formats like SAS and Word
- Preserve that content "as is" at the bit-level
- Completely re-do all of the data and documentation, preserving the intellectual content (modulo disclosure concerns), but reorganizing it all
- Produce normalized and ready-to-use content based on the re-do
- Preserve the normalized content forever
So at the file-level we track all of the original deposits and all of the content we produce, and we test the integrity of each file every week. Since my team inherited the responsibility to manage archival storage n 2006 I've never seen a problem that wasn't traced back to a transient error that took place as content was being copied into archival storage, and where the solution wasn't solved when the ICPSR staff member re-ran the copy.
We also track the chain of custody at the aggregate level, assigning each "deposit" and "study" to both a workgroup and an individual, and by linking deposits to studies (and vice-versa). We have internal systems to manage both deposits and studies, and they include mechanisms whereby a data manager can edit metadata, assign key dates, and enter diary entries, not unlike a trouble ticket or help desk system.