C1.5 Repository has effective mechanisms to detect bit corruption or loss.
The repository must detect data loss accurately to ensure that any losses fall within the tolerances established by policy (see A3.6). Data losses must be detected and detectable regardless of the source of the loss. This applies to all forms and scope of data corruption, including missing objects and corrupt or incorrect or imposter objects, corruption within an object, and copying errors during data migration or synchronization of copies. Ideally, the repository will demonstrate that it has all the AIPs it is supposed to have and no others, and that they and their metadata are uncorrupted.
The approach must be documented and justified and include mechanisms for mitigating such common hazards as hardware failure, human error, and malicious action. Repositories that use well-recognized mechanisms such as MD5 signatures need only recognize their effectiveness and role within the overall approach. But to the extent the repository relies on homegrown schemes, it must provide convincing justification that data loss and corruption are detected within the tolerances established by policy.
Data losses must be detected promptly enough that routine systemic sources of failure, such as hardware failures, are unlikely to accumulate and cause data loss beyond the tolerances established by the repository’s policy or specified in any relevant deposit agreement. For example, consider a repository that maintains a collection on identical primary and backup copies with no other data redundancy mechanism. If the media of the two copies have a measured failure rate of 1% per year and failures are independent, then there is a 0.01% chance that both copies will fail in the same year. If a repository’s policy limits loss to no more than 0.001% of the collection per year, with a goal of course of losing 0%, then the repository would need to confirm media integrity at least every 72 days to achieve an average time to recover of 36 days, or about one tenth of a year. This simplified example illustrates the kind of issues a repository should consider, but the objective is a comprehensive treatment of the sources of data loss and their real-world complexity. Any data that is (temporarily) lost should be recoverable from backups.
Evidence: Documents that specify bit error detection and correction mechanisms used; risk analysis; error reports; threat analyses.
For each object in Archival Storage, ICPSR computes a MD5 hash. This "fingerprint" is then stored as metadata for each object.
Automated jobs "prowl" Archival Storage on a regular basis computing the current MD5 hash for an object, and comparing it to the stored version. In the case where the hashes differ, and exception is generated, and this information is reported to the appropriate staff for diagnosis and correction.
In practice we see very few exceptions such as these, and the most common cause is a blend of human-error and software failing to handle the error gracefully.
Recovery is quick. In the event the problem was caused by human-error, and the ctime (last modified) timestamp has changed, then any copies managed via rsync may also be damaged, and we instead need to fetch the original object from a different source (e.g., tape or a copy managed via SRB's Srsync). In the event the problem was caused without ctime also changing, then we also have the option of fetching an original copy from one of our rsync-managed copies.