Monday, January 9, 2012
ICPSR web availability through December 2012
Of course, December is always a tricky month here at ICPSR. Snow storms. Ice storms. Power outages. I can't remember the last time that my entire team was able to take off the entire week between Christmas and New Years (like the rest of the U-M) without having to come into the office to troubleshoot a problem.
And this year was no different.
We started to see sporadic up/down alerts from the U-M network monitoring system on the morning of December 30. It looked like our production web server was working OK overall, but having some problems. When we tried to load the home page from home, the page wouldn't load. And when we tried to login (via ssh) from home, the connection timed out. It looked as if everything was down even though the monitoring system said it was OK.
We found we could log into other systems on campus, and then use those as a launch pad to get to ICPSR. All of our systems were up, but none seemed reachable from systems off campus. This explained why the U-M monitoring system didn't through more alarms earlier.
Then we noticed this:
We then worked with the campus network engineers to draw their attention to the problem that was affecting us. Unfortunately it was kind of helpful to have the ICPSR web site be unavailable from off-campus as a test case; we would know the network was fixed when the web site was available again.
All in all not a horrible month for availability, but we moved from 99.9% on Dec 29 to 99.5% by the end of Dec 30.