Friday, December 3, 2010

TRAC: B2.8: Capturing Representation Information

B2.8 Repository records/registers Representation Information (including formats) ingested.

When international standards for the associated Representation Information are not available, the repository needs to capture such information and register it so that it is readily findable and reusable. Some of it may be incorporated into software. The Representation Information is critical to the ability to turn bits into usable information and must be permanently associated with the Content Information.

Evidence: Viewable records in local format registry (with persistent links to digital objects); local metadata registry(ies); database records that include Representation Information and a persistent link to relevant digital objects.

As noted in last week's post, we capture representation information in both IANA MIME type form and also in a more human-readable form. We are also looking at adding an additional piece of representation information to the metadata that surrounds the files deposited at ICPSR:  file type or file role.

In brief, the idea is to capture the high-level concept behind the role the file plays in research.  For example, it may be nice to know that a given file is an Excel workbook, but it is also important to know whether the file contains data, documentation, a database of sorts, or some combination of things.  An Excel file that contains nothing but columns of numbers and text might be normalized quite easily into a more durable format.  An Excel file that contains nothing but text and images and descriptions of a data file might be converted to PDF/A or TIFF or some other format.

This idea has been used with the derived content that ICPSR produces for a very long time, but ICPSR is just now exploring the required changes in business process to do this for deposited files as well.  More as this story develops....

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.