Friday, October 16, 2009

Back to the Fedora: Part 2


To go along with our survey data object, we'll also need a survey documentation object. We'll relate the objects via RDF in the RELS-EXT Datastream, and we'll also relate the documentation object to the higher-level, aggregate object, "social science study." The image to the left is clickable, and will take one to the "home page" for this Content Model object in the ICPSR Fedora test server.

Note that the name of this Content Model object is somewhat of a misnomer. Even though a common use-case is survey data, we may use the same type of object for other social science data that are not survey data, such as government-generated summary statistics about health, crime, demographics, or all sort of other things.

The heart of the Content Model is in the DS-COMPOSITE-MODEL Datastream where we require a large number of Datastreams: a "setups" Datastream for each of the common statistical packages; a DDI XML Datastream that documents the associated survey data object; and a pair of Datastreams for the human-readable technical documentation (the "codebook"). A future refinement might be to replace the pair - one PDF, one TIFF - with a single Datastream which is both durable for preservation purposes, but which also allows the rich display of information (PDF/A?).



At the right we have a data object that conforms to the Content Model object above. Of course, it contains all of the required Datastreams, most of which are stored as simple text files. The DDI is actually a very large bit of XML which is currently being stored in a separate file rather than as in-line XML (i.e., Control Group M rather than Control Group X in the FOXML).

The relationships in the RELS-EXT Datastream are congruent with those in the associated survey data Datastream. Both assert a hasModel relationship to the applicable Content Model, and both assert a isMemberOf relationship to the higher level object that "contains" them. Here, though, we use the isDescriptionOf relationship rather to show that this documentation object is a description of its related survey data object; in that object we asserted a hasDescription relationship to this object.

Of course, there is nothing preventing us from adding additional Datastreams to an object like this when they are available, such as unstructured notes from the original data collector. However, since that content isn't always available, we don't make it a required Datastream in the Content Model.

Clicking the image to the right will take one to its "home page" on the ICPSR Fedora test server. All of the Datastreams are identical to those on the ICPSR web site, except for the TIFF codebook and variable-level DDI, which we usually do not make available.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.