|Click to embiggen|
The main antagonist we faced was a new and unusual problem with our Oracle database server. For many years we would export the content for backup purposes each evening, and it worked well for a decade. However, suddenly in 2012 we began to experience an outage just AFTER each export. Despite intensive analysis by ourselves and local Oracle exports, we never could isolate the root cause of failure.
We eventually "solved" the problem by exporting our database only once per week v. once per day. That left us more exposed to loss, of course, but it seemed to limit the outages to once per week v. once per day.
We then replaced the hardware with a new machine with a bit more processor and memory, but with blindingly fast solid-state drives. With the new machine deployed we returned to our daily export schedule, and the machine -- and our web availability -- have been in pretty good shape ever since. The machine went into service in April 2012, and the chart above makes it clear that life has been a little less hectic for our on-call engineer since then.