Google+ Followers

Wednesday, April 25, 2012

Disaster Recovery at ICPSR - Part 2

Part 1 ended with ICPSR embarking on a project to build an off-site replica of its delivery system.

Amazon Web Services

I had been exploring Amazon Web Services (AWS) a little bit in late 2008, and had found it to be a very quick and easy way to stand-up technical infrastructure.  In contrast to the process we had been using to try to locate equipment at a University of Michigan data center, locating (virtual) equipment in AWS was astonishingly easy. I needed only a credit card and a Firefox plug-in to get started, and by using the excellent AWS-supplied tutorials I had soon deployed a stealth, slave DNS server for icpsr.umich.edu in AWS.  (A stealth server does not appear in the NS records for a domain.)

Also, AWS made it easy to grow into the cloud a little bit at a time.  Is a "small" virtual server under-powered for a replica of our production web server?  No problem, just terminate that virtual machine and relaunch the image on a "medium" virtual server.  Likewise we could add storage space when we needed it v. investing in a storage array which would be obsolete within two years.

We soon built enough infrastructure in AWS to serve as a replica, and it looks like this:

Click to enlarge.

Touring the replica

In addition to the slave DNS server we also stood up three additional servers in AWS.

One, a replica Oracle database server.  This is what AWS calls a c1.medium-sized instance, and mirrors the content we store in our production database.  We export content from the production database each morning, copy it to AWS, and then import it into the replica.

Two, a replica of our Child Care and Early Education Research Connections (CCEERC) web portal.  This portal runs on a virtual interface on the production web server, but it isn't so easy to add virtual interfaces to AWS instances.  This is what AWS calls an m1.small-sized instance, and provides the same basic content and functionality as www.cceerc.org.  We use rsync over ssh twice each day to keep content and web applications up to date.

Three, a replica of our main web portal.  This runs on what AWS calls a m1.large-sized instance since it bears the largest burden of any component.  Like with the CCEERC replica we synchronize content here on a twice daily basis.  We also disable certain web applications, like the Deposit System, so that we do not introduce potentially sensitive content to the cloud.  However, common services like search, browse, download, and analyze online are all available.

Each replica has a list of little white lies inside /etc/hosts that lead each machine to believe that www.icpsr.umich.edu and db.icpsr.umich.edu really do reside in AWS.  This trick allows us to run the same apps in the cloud without resorting to fragile, high maintenance software modifications that try to distinguish between systems in the cloud and systems in ICPSR's machine room.

Next up: Part 3: Using the replica

No comments:

Post a Comment