How is ICPSR using "the cloud?"
I've been getting this question a lot lately, and it feels like it's time to put together a blog post on this question.
From a functional standpoint ICPSR is using the cloud for identity and authentication, content delivery, archival storage, and data producer relationship management. And if I include services based at the University of Michigan, I might also include data curation, and customer relationship management.
From a vendor standpoint here's a roster of some of the organizations with whom we're doing business, and how their piece of the cloud helps us run our business.
Given that the ICPSR-specific identities are weak (i.e., web site visitors create them by entering an arbitrary email address and password) and given that they identity is often used only once, it seemed like a good idea to eliminate the need to create such an identity. We don't need strong identities, but we do need identities that would be available to anyone. Technologies like OpenID, Facebook Connect, and the like seemed promising, but who wants to build infrastructure which talks to all of them?
We use Janrain Engage as one part of our identity and authentication strategy. Janrain acts as a third party between the content provider (ICPSR) and the identity providers. And so when someone needs to log in to ICPSR's portal, they see a screen that looks something like this:
So there's no need to create an account and password at ICPSR. And if someone does return later, they don't have to log in to our site if they've already logged in to their identity provider's site. (This is Single Sign-On or SSO.)
We also host a replica of our on-site delivery system in Amazon's cloud for disaster recovery (DR) purposes. We find that we have the opportunity to "test" this replica at least once per year when ICPSR's headquarters loses power for several hours due to high winds, ice storms, or other acts of nature.
The Amazon service has been very reliable overall (despite a few highly publicized events), and certainly more reliable than our own on-site facilities. We also like that we can scale resources up and down very quickly, and that we have clear costs associated with the infrastructure. (Anyone at an institution of higher learning who has tried to calculate the actual cost of electricity used knows what I mean.)
We had been using a home-built application to manage this content, but we found it to be a losing battle. There was never enough money or time to build the types of relationship management reporting systems that the acquisition team wanted. And so rather than trying to build a better mousetrap, we decided to rent a better mousetrap by moving the content into a professional contact/customer relationship management (CRM) system. Like Salesforce.
The University of Michigan central IT organization (ITS) also delivers a handful of services that I would consider "the cloud" even though they do not package and market them that way. File storage, trouble ticketing, and Drupal-hosting are all available from ITS, and they all look like cloud services to us because we pay for only what we use, we can scale them up and down reasonably quickly, and we do not have to deploy any local hardware or software to use them.