Here's a first cut at how one might model social science data content (like what we have at ICPSR) in Fedora.
My sense is that the "dataset" is the atomic object of interest. Heading down to the variable level feels too fine-grained for a system like Fedora. And so this led to creating two pretty simple Content Model objects:
ICPSR-8475 (DC, RELS-EXT, AUDIT, DOCS)
The RELS-EXT datastream would express relationships to show that it contains two members, a Data object following the Dataset Content Model, and a Data object following the Docset Content Model. The RELS-EXT datastream would also assert a hasModel relationship to the Study Content Model. We also need a mechanism for storing access controls and license terms, but I'm still learning about the XACML stuff that might be a good way to do this. It may also make sense to have a datastream for DDI 2.x metadata and one for DDI 3.x metadata rather than just a single one.
Likewise, we would also have:
ICPSR-ANES-1948-2004 (DC, RELS-EXT, AUDIT, DATA, DOCS)
The RELS-EXT datastream would express a memberOf relationship to ICPSR-8475, a hasModel relationship to the dataset Content Model, and a hasModel relationship to the docset Content Model. Like the study Content Model data object above, still need to sort out the mechanism for storing access controls and license terms, and also if two datastreams v. one for DDI would be appropriate.
This then assumes that we might create the dissemination formats, like SPSS Export, on the fly since there isn't a datastream for that stuff. That might work, or we could add additional datastreams to the dataset that would point to the dissemination formats. We might want these to use the Externally Reference Content control group if we consider the dissemination formats to be somewhat ephemeral; the alternative would be Managed Content if we wanted to be able to manage the content in Fedora, perhaps allowing one to roll back to previous versions.
Next: So why two different Content Models, one for data and one for documentation, rather than just a single one?
My sense is that the "dataset" is the atomic object of interest. Heading down to the variable level feels too fine-grained for a system like Fedora. And so this led to creating two pretty simple Content Model objects:
- Dataset - this has one datastream, DATA, and the MIME-type is text/plain
- Docset - this has one datastream, DOCS, and the MIME-type is text/xml, and would hold DDI-format metadata at the dataset-level
I thought I would also need a Content Model for the basic unit of dissemination we use:
- Study - this has one datastream, DOCS, and the MIME-type is text/xml, and would hold DDI-format metadata at the study-level
ICPSR-8475 (DC, RELS-EXT, AUDIT, DOCS)
The RELS-EXT datastream would express relationships to show that it contains two members, a Data object following the Dataset Content Model, and a Data object following the Docset Content Model. The RELS-EXT datastream would also assert a hasModel relationship to the Study Content Model. We also need a mechanism for storing access controls and license terms, but I'm still learning about the XACML stuff that might be a good way to do this. It may also make sense to have a datastream for DDI 2.x metadata and one for DDI 3.x metadata rather than just a single one.
Likewise, we would also have:
ICPSR-ANES-1948-2004 (DC, RELS-EXT, AUDIT, DATA, DOCS)
The RELS-EXT datastream would express a memberOf relationship to ICPSR-8475, a hasModel relationship to the dataset Content Model, and a hasModel relationship to the docset Content Model. Like the study Content Model data object above, still need to sort out the mechanism for storing access controls and license terms, and also if two datastreams v. one for DDI would be appropriate.
This then assumes that we might create the dissemination formats, like SPSS Export, on the fly since there isn't a datastream for that stuff. That might work, or we could add additional datastreams to the dataset that would point to the dissemination formats. We might want these to use the Externally Reference Content control group if we consider the dissemination formats to be somewhat ephemeral; the alternative would be Managed Content if we wanted to be able to manage the content in Fedora, perhaps allowing one to roll back to previous versions.
Next: So why two different Content Models, one for data and one for documentation, rather than just a single one?
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.