Technology at ICPSR: Back to the Fedora: Part 1

Friday, October 9, 2009

Back to the Fedora: Part 1

Now that the NSF EAGER grant has arrived, it's time to get restarted on Fedora. We'll start this iteration with a trio of Content Model objects, and kick it off with the first one in this post.

The first - displayed in a clickable, linked, visual format to the left - is a Content Model object for social science survey data. In addition to the objectProperties and the required Datastreams (AUDIT, DC, RELS-EXT), there is also the standard DS-COMPOSITE-MODEL Datastream found in Content Model objects.

For our purposes we'll require each object that purports to conform to a social science survey data object to have three required Datastreams: ORIGINAL, for original survey data that was supplied by the depositor; NORMALIZED, for a plain text version of the file that repository prepares; and, TRANSFORM, which is a record that describes how the ORIGINAL became the NORMALIZED. This last Datastream is typically constructed as an SPSS Setups file at ICPSR, and internally it is often referred to as the "processing history" file. It contains the roadmap of how to move between the two versions of the data.

It may also be the case that we have other Datastreams, perhaps items that will only receive bitwise digital preservation, such as original deposits in SAS or SPSS format. And, in practice, we might want to use Fedora's XACML mechanism to restrict access to the ORIGINAL Datastream since it could contain confidential information.

To the right we have a sample Fedora data object that asserts conformance with our Content Model object above. Like the one above it is also clickable, and will take you to the Fedora repository server ICPSR is using for testing.

In addition to the hasModel relationship, this object also asserts that it is a member of a higher-level object (ICPSR Study 25041), and is described by another object (which we'll look at in the next post).

As required to validate against the Content Model, it has the three required Datastreams. In this particular case, rather than including the original data and processing history transform, I've simply copied the NORMALIZED Datastream content verbatim into the other two Datastreams.

Not shown in the schematic to the right are other possible. optional Datastreams we could include. For instance, it looks like this object was derived from a deposit that began its life at ICPSR as a SAS Transport file. It would certainly be possible to include that as another Datastream that would have value for a limited period of time. Or, another approach would be to collect the deposited items in their own set of Fedora objects, and then assert a relationship to them in the RELS-EXT section.

Next up in this series: the Content Model for technical documentation.

Technology at ICPSR

Friday, October 9, 2009

Back to the Fedora: Part 1

No comments:

Post a Comment