Wednesday, October 5, 2011

Web availability through September 2011

ICPSR web availability through 9/2011
Web site availability was good, but not great in September.  We've found that our Solr search query process is the most fragile piece of the infrastructure, and it got "stuck" on Sunday evening, 9/25.  Usually these are easy-to-correct faults; we just restart the tomcat instance hosting the Solr search query service.  But on this particular night the on-call missed the page, and the U-M Network Operations Center (NOC) did not open a ticket and phone the on-call, and so it lasted closer to 90 minutes.

During that time the web site was still usable, of course, and lots of functions would have worked normally (viewing pages, download studies, using SDA, etc).  But we start our "unavailability counter" whenever any part of the infrastructure is unavailable.  But my apologies if you were trying to search our catalog at that time.

Our analysis is that the virtual machine is running out of memory on our current (but old) web server.  We have a new 64-bit machine with significantly more memory available, and we'll been prepping it to take over for the old machine.  In the process of building the new machine we've been upgrading versions of Red Hat, tomcat, java, and many other key elements, and this has made the going a bit more slow than usual, but should give us a machine with better software.  And software that doesn't need to be upgraded right away (I hope!).

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.