Wednesday, December 28, 2011

Starting the FLAME

In an earlier post I described a major new project at ICPSR called FLAME.  FLAME is the File-Level Archival Management Engine, and will become the new repository technology platform ICPSR uses to curate and preserve content.  As the name implies, the main molecule of information upon which FLAME will operate is a "file" which is different than the main molecules used at ICPSR today, the "deposit" and the "study."  In the big picture the activities at ICPSR will not change much:  we will still collect social science research data, curate them, preserve them, and make them available in a wide variety of formats and modes.  But when one looks at the details, an awful lot will change.

So when one is going to change everything, where does one start?

Fortunately we have a ready-made starting point with the Open Archival Information System (OAIS) reference model.  While this does not give us a blue print of what to build, it does give us a model to use as we construct our blueprints.  I believe this is very much what the folks at Archivematica have done.

So the question becomes:  How do we translate a high-level reference model that contains functions such as Receive Submission to the low-level blue prints one needs to reconfigure process and build software?  What kind of web applications do I need for Receive Submission?  What should they do?  Should that box that contains the submitter's identity be an email address?  A text string?  An ORCID?

So how to start?

One of my colleagues, Nancy McGovern, suggested we brainstorm 6-12 medium-level statements for each of the functions in the OAIS reference model.  We started with Receive Submission, and indeed generated 12 statements.  (The analogue at Archivematica is Receipt of SIP.)  One example is:

The producer provided basic provenance information at deposit

If the metaphor for building FLAME is building a house, then OAIS plays the role of high-level best practices.  The statements (like above) play the role of floor plans and elevations; those things to which most people can relate and make decisions.  So this is moving in the right direction, but we're still lacking the blueprints.

The next step is to take a statement like the one above and turn it into requirements for software (and for process).  One example requirement that flows from the statement above is:


FLAME should capture the following provenance information from the files after each content transfer:

i. Date and time at which each file is received
ii. Checksum of each file
iii. MIME type of each file
iv. Original name of each file
v. Packaging information (e.g., file was part of a Zip archive)

We can then discuss these low-level requirements with stakeholders, such as the acquisitions team, and with the technology team, such as a software developer who may have additional questions (e.g., "Well, what sort of checksum do you want - MD5, SHA-1, or something else?").

Right now we are working through the details of Receive Submission, and the next few stops on the roadmap will likely be in Ingest as well.  We're documenting both the high-level statements and the low-level requirements in a Drupal CMS that we use as our Intranet.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.