Wednesday, September 2, 2009
ICPSR: Then and Now: Servers
In 2002 ICPSR had two main systems - a pair of Sun E3500s with 4GB of memory. One machine served as our production web server, and the second did everything else: general-purpose computing for data processing, Oracle database service, file service (NFS and CIFS via samba), DNS service, etc. We also had a very small number of additional machines, such as a system for testing new web applications. All of the machines were built by Sun Microsystems, used Sun's SPARC processors, and ran Sun's operating system, Solaris. We entered into a maintenance contract with Sun in case either of the machines had a problem, and my recollection is that it ran around $15k/year to cover the two big machines plus a handful of external storage arrays. To Sun's credit they were very solid machines.
In 2009 ICPSR has more servers than I can describe easily in a blog post. We still have a pair of machines for delivering web content and general-purpose computing, but they were built by Dell, use Intel processors, and run Red Hat Linux. Today's machines have much more memory and many more processors, and they too have been solid. But we also have many smaller machines with very specific roles: delivering network services (DNS, DHCP, etc); operating our LOCKSS network; staging new web content; replicating services for the Minnesota Population Center; hosting MySQL and Oracle databases; and so on. And, of course, in 2009 Sun Microsystems is about to be swallowed by Oracle.
However, this proliferation of server computing systems has likely reached its apogee at ICPSR. With the rise of virtualization and particularly the rise of the cloud, we're much more likely to build future systems in Amazon's Elastic Computing Cloud (EC2) rather than building them on real (or virtual) machines at ICPSR. For every rack-mount server we have at ICPSR, we probably have one much smaller blade server, and for every blade server, we probably have one EC2 instance running in the cloud.
My sense is that we'll continue this trend, and that where practicable, we'll deploy new systems in a cloud environment rather than purchasing new hardware. In addition to Amazon's cloud offering, the University of Michigan is deploying its own virtualization service, and that will be an attractive choice for systems that consume a lot of network I/O. Amazon charges for network I/O, but U-M does not.
We may also replace several virtual machines in the cloud with an out-sourced service: we already use SalesForce.com as our platform for managing "data leads." It's easy to imagine us adopting OpenID via a service provider such as RPX rather than hosting our own service locally or in a cloud, for example,