Wednesday, May 4, 2011

TRAC: B4.1: Employing documented preservation strategies

B4.1 Repository employs documented preservation strategies.

Documented preservation strategies include evidence of planning for strategies not yet employed against the repository’s digital objects. A repository is likely to employ multiple strategies. Different strategies may be employed by class (type) of digital object, and/or multiple strategies may be employed on a single object class. This will depend upon local repository policies and practices, though any such strategy decisions should be documented and should be based on sound community practice.

Minimally, documentation of preservation strategies must be included in repository policies and practices. Good repository practice also requires that preservation strategies employed against digital objects are recorded in the object’s preservation metadata. (See also B3.3.)

Evidence: Documentation of strategies and their appropriateness to repository objects; evidence of application (e.g., in preservation metadata); see B3.3.

ICPSR tends to use a single preservation strategy (normalization) for its content, perhaps because it is relatively uniform - survey data and documentation.  There's a nice explanation of this strategy on the web site here, which also defines the term.

I wasn't able to find a document which mapped a content type to a specific, normalized format, and so to make our documentation a bit more complete, I'll offer such a table here:

File typeSpecific strategy
Survey data (original format varies, but is often Excel, SAS, Stata, or SPSSNormalize to plain character data (ASCII) + "setup" files, one for each of the major statistical analysis packages
Study-level and dataset-level metadataDDI v2 XML
Technical documentationBoth PDF and TIFF; would like to transition to PDF/A where possible
Other textual artifacts, such as a user guide or questionnairePlain text or PDF or TIFF

If and when ICPSR really jumps into new types of content, such as video or still images, clearly those types of content will need different strategies.

