Wednesday, July 17, 2013

ICPSR Web Availability - 2012-2013

Here are the final numbers for ICPSR's web site availability over our last fiscal year:

Click to embiggen
The year did not start off so well, and we reached the nadir quickly.  August 2012 was our worst period of availability in a very bad year for us overall.  January, March, and June 2012 also had very poor numbers.

The main antagonist we faced was a new and unusual problem with our Oracle database server.  For many years we would export the content for backup purposes each evening, and it worked well for a decade.  However, suddenly in 2012 we began to experience an outage just AFTER each export.  Despite intensive analysis by ourselves and local Oracle exports, we never could isolate the root cause of failure.

We eventually "solved" the problem by exporting our database only once per week v. once per day.  That left us more exposed to loss, of course, but it seemed to limit the outages to once per week v. once per day.

We then replaced the hardware with a new machine with a bit more processor and memory, but with blindingly fast solid-state drives. With the new machine deployed we returned to our daily export schedule, and the machine -- and our web availability -- have been in pretty good shape ever since. The machine went into service in April 2012, and the chart above makes it clear that life has been a little less hectic for our on-call engineer since then.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.