|Click the picture to enlarge.|
Two weeks ago, Amazon released a nice set of icons for use in common drawing and presentation software. The set contains an icon for all of the Amazon Web Services (AWS) services and types of infrastructure, and it also contains generic, gray icons for non-AWS elements. I used the icons to create the nice schematic above.
The diagram is based on one of the examples Amazon includes in the PPTX-format set of icons. I needed to delete a few services and servers that we don't use (e.g., Route 53 for DNS). The diagram shows the ICPSR machine room on the left, and the three main systems that deliver our production web service: a big web server, an even bigger database server, and an even bigger still EMC storage appliance. We synchronize the content from these systems into corresponding systems in the AWS cloud.
We use EC2 instances in the US-East region to host our replica. Unlike physical hardware where we sometimes host multiple IP addresses on a single machine, we maintain a one-to-one mapping between virtual machines and IP addresses in EC2. And so one physical web server in ICPSR's machine room ends up as a pair of virtual servers in Amazon's cloud.
We initiate a failover by changing the DNS A (address) record for www.icpsr.umich.edu and www.cceerc.org. This change can take place on either a physical DNS server located at ICPSR or a virtual DNS server located in AWS. The time-to-live (TTL) is very low, only 300 seconds, and so once we initiate the failover procedure, web browsers will start using the replica very soon. (However, we have noticed that long-lived processes which do not regularly refresh name-to-address resolution for URLs, like crawlers, take much longer to failover.)
The replica supports most of the common services on the production ICPSR web site, such as search, download, analyze online, etc, but it does not support services where someone submits content to us, such as the Deposit System.
It is important to note that our replica is intended as a disaster recovery (DR) solution, not a high availability solution. That is, the purpose of the replica is to allow ICPSR to recover quickly from failure, and to avoid a long (e.g., multi-day) period of unavailability. The replica design is not at all a solution for a high-availability web site, one that would never be down even for a second. It would take a significant investment to change the architecture of ICPSR's delivery platform to meet such a requirement.