Monday, August 8, 2011

ICPSR's Secure Data Environment (SDE) - The Workflow

One of the most challenging aspects of building our Secure Data Environment (SDE) for managing social science research data was redesigning our workflow.

In the past our environment was quite open, and so the workflow did not need to be concerned with certain aspects of access control.  For example, if the workflow required an ICPSR data manager to send an email to to the original depositor, and the data manager wanted to include some content that was cut and pasted from the dataset, they would have been able to do that at any point in the process without any special actions.  But, in the SDE we have disabled email, and we tightly control how material leaves the SDE, so this sort of free access is no longer available.  And so the workflow had to change.

Content arrives at ICPSR through our Deposit Form web application.  In brief this is how a depositor transfers content to us and grants us non-exclusive access to manage and share the content.  (They can also add descriptive metadata to the content too.)  The Deposit Form runs on our public-facing web server, which, of course, does not reside within our SDE.

One change we made, therefore, was to encrypt all content as it arrives.  This means that the content isn't available in the clear - even accidentally - on our public web server.  Next, an automated job runs on a regular, frequent basis, "sweeping" content from the public web server to the SDE.  Once it arrives within the SDE we decrypt the content so that the data manager has easy access to the materials.

Data managers use another web application called the Deposit Viewer to view and manage deposits, and while they can view metadata about the deposit from the desktop and the SDE, they can only download the deposited files from within the SDE.  This gives them the convenience of checking on deposit status, for example, from either environment, but ensures that the files do not leave the secure environment accidentally.

All data management functions take place within the SDE.  A data manager may move content from the SDE to the outside world, but the transfer takes place via a software airlock.  The airlock tracks what has been moved, who has moved it, and requires a supervisor to approve the transfer.

Once the data manager has completed all data processing and quality control, s/he then uses a set of utilities to generate the "ready to go" formats that we distribute via our web site and to release the materials both to the web site and to our archival storage fabric.  Like the airlock process above, this step tracks who did what and when and where, and also requires management approval.  The archival copies remain within the SDE, and the public-use, ready-to-go files move to our web site.  Ensuring that key software systems have access to push files out of the SDE, but ensuring that staff do not, also required a few changes to our workflow.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.