Wednesday, November 17, 2010

Fedora objects for deposits

Researchers and government agencies (and their proxies at ICPSR) use a web portal called the Data Deposit Form to transfer content to ICPSR.  The form contains many opportunities for a depositor to enter metadata about the transfer, but only a few are required:  the name of the depositor and a name for the deposit.

A deposit may have an arbitrary number of files, and those files may be uploaded individually or as a single "archive" file, such as a Zip or GNU Zip archive.  In a case where the depositor uploads an archive file, ICPSR unpacks it to extract the actual content.  And if the archive file contains an archive file, ICPSR systems continue unpacking recursively.

Our intention is to put each of the deposited files (unpacked, if necessary) in its own Fedora object.  This object will be an off-the-shelf object without any special Content Model.  Here is an example:

(Note that all of the images are also hyperlinks to Fedora Objects in our public Fedora Commons repository.)

This is a standard Fedora Object, conforming only to the Content Model for all objects.

Each deposited file contains a unique ID captured in the PID, and the usual, minimal Fedora object properties.

We also enable the Audit Datastream to record any changes to the object, and use the DC (Dublin Core) Datastream to capture some of the metadata we collect via our Data Deposit Form.

We use a relationship expressed in the RELS-EXT Datastream to point to a parent-level object which is used to link the files within a single deposit and to capture any metadata which applies to the entire deposit, not just the individual files.

The content is highly variable.  In addition to receiving survey data in plain text, ICPSR also receives data in a variety of proprietary formats (e.g., SAS) and related documentation in a wide array of formats (word processor output, plain text documents, and many others).

To illustrate the example further, we created a Fedora Object for each of the files found in one of our recent deposits.  We selected this deposit because the content is entirely public-use, and is readily available from a public web site.  The deposit is also a nice size (only four files).  To keep this blog post at a reasonable size, I'll save the example for tomorrow's post.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.