Wednesday, November 30, 2011

ICPSR and the cloud

How is ICPSR using "the cloud?"

I've been getting this question a lot lately, and it feels like it's time to put together a blog post on this question.

From a functional standpoint ICPSR is using the cloud for identity and authentication, content delivery, archival storage, and data producer relationship management.  And if I include services based at the University of Michigan, I might also include data curation, and customer relationship management.

From a vendor standpoint here's a roster of some of the organizations with whom we're doing business, and how their piece of the cloud helps us run our business.

A typical transaction on the ICPSR web site looks like this:  Search.  Select content for download.  Create an ICPSR-specific identity.  Authenticate using that identity.  Download the content.  Do not return to ICPSR for at least a year.

Given that the ICPSR-specific identities are weak (i.e., web site visitors create them by entering an arbitrary email address and password) and given that they identity is often used only once, it seemed like a good idea to eliminate the need to create such an identity.  We don't need strong identities, but we do need identities that would be available to anyone.  Technologies like OpenID, Facebook Connect, and the like seemed promising, but who wants to build infrastructure which talks to all of them?

Janrain does.

We use Janrain Engage as one part of our identity and authentication strategy.  Janrain acts as a third party between the content provider (ICPSR) and the identity providers.  And so when someone needs to log in to ICPSR's portal, they see a screen that looks something like this:


So there's no need to create an account and password at ICPSR.  And if someone does return later, they don't have to log in to our site if they've already logged in to their identity provider's site.  (This is Single Sign-On or SSO.)

We're hosting several web portals in Amazon's cloud.  We're using Amazon's Infrastructure as a Service (IaaS) to stand-up Linux systems in the Amazon Elastic Computing Cloud (EC2) that are identical to the images we host locally.  We back the instances with Elastic Block Storage (EBS) volumes so that the content persists when we need to terminate and restart a computing instance. 

We also host a replica of our on-site delivery system in Amazon's cloud for disaster recovery (DR) purposes.  We find that we have the opportunity to "test" this replica at least once per year when ICPSR's headquarters loses power for several hours due to high winds, ice storms, or other acts of nature.

The Amazon service has been very reliable overall (despite a few highly publicized events), and certainly more reliable than our own on-site facilities.  We also like that we can scale resources up and down very quickly, and that we have clear costs associated with the infrastructure.  (Anyone at an institution of higher learning who has tried to calculate the actual cost of electricity used knows what I mean.)

I've posted many times about our relationship with DuraCloud, and how we're using it as a mechanism for storing archival copies in the cloud.  In many ways DuraCloud fulfills a role similar to that of Janrain Engage by providing a layer of abstraction between ICPSR's technical infrastructure and that of multiple service providers.  In this case we manage one vendor and one set of bills, but have the ability to store content in the cloud storage service of multiple providers (Amazon, Rackspace, Microsoft).

The acquisitions team at ICPSR keeps an eye on grants funded by places like the National Science Foundation and the National Institutes of Health.  If a grant looks like it may be producing data the team makes a note to contact the primary investigator (PI).  The goal is to have a conversation with the PI to see if there will indeed be data produced, and to see if it might be a good fit for ICPSR's holdings.  If so, we then try to convince the PI that depositing the content with ICPSR would be good for everyone (more data citations for the data producer; more re-use of the data by other researchers; etc.).

We had been using a home-built application to manage this content, but we found it to be a losing battle.  There was never enough money or time to build the types of relationship management reporting systems that the acquisition team wanted.  And so rather than trying to build a better mousetrap, we decided to rent a better mousetrap by moving the content into a professional contact/customer relationship management (CRM) system.  Like Salesforce.

 The University of Michigan central IT organization (ITS)  also delivers a handful of services that I would consider "the cloud" even though they do not package and market them that way.  File storage, trouble ticketing, and Drupal-hosting are all available from ITS, and they all look like cloud services to us because we pay for only what we use, we can scale them up and down reasonably quickly, and we do not have to deploy any local hardware or software to use them.

Monday, November 28, 2011

Hi Ho, a Googling we will go!

The University of Michigan announced (on Halloween! - I hope this is not a trick!) that it will be adopting Google as its collaboration platform.  The roll-out will happen over the course of the next year, and includes tools such as Gmail, Sites, Docs, Calendar, Blogger, and more.

I am delighted.

I've been using Google's Blogger technology (obviously) for some time to publish the Tech@ICPSR blog, and use Google Docs for almost any project where I would have used Microsoft Office in the past.  (I do still use PowerPoint from time-to-time if I need something fancy-schmancy, and don't have the time to conceptualize it as a Prezi instead.)

The biggest win for ICPSR, however, is with Gmail and Calendar.

When I arrived at ICPSR in 2002 we were running our own IMAP-based service with Eudora as the supported client.  And by supported I mean that we installed the free "hey look at these ads" version on each person's machine.  Off-site access was the responsibility of the individual, although we did hook it up to a campus webmail front-end eventually.  We were running MeetingMaker as our supported calendar client.  And by supported I mean that we installed the client on everyone's machine, but no one used it.

Sometime in 2005 or so we realized that it wasn't much fun running email and calendar services, and we also noted that we were already paying for an enterprise mail/calendar system that our parent organization, the Institute for Social Research (ISR), operated on the Exchange platform.  And so we dumped Eudroa and MeetingMaker and started using the Microsoft stack instead.

I was delighted.

However......

I soon experienced the harsh realities of life in the Microsoft stack.  Small mailbox quotas.  Feature-poor webmail experience.  Mailbox "archives" living in one-off files on my PC or file server.  And have you ever tried to find the full email headers in a piece of email stored on an Exchange server?  And like our days of running Eudora and MeetingMaker we continued to be isolated from the rest of campus since our Exchange system was local to ISR and not part of a campus-wide solution.

The honeymoon had ended.

I solved the problem for myself (sort of) by maintaining my "internal to ISR" meetings and email on the ISR Exchange server, but moving my "external" meetings and email to Google.  That is, I changed the U-M address book so that my bryan (at) umich.edu email address was routed to Gmail rather than the Exchange server.  And so when it comes to communicating with the world outside of the ISR, I have a rich email experience that works well in any web browser, superb mail searching, and despite not deleting a single piece of non-spam email in nearly three years, I have used less than 20% of my mail quota.  At this rate, I will need to delete my first email in 2024.  Nice.  Of course, the problem is that I now check email and calendars in two places:  MS Exchange (for my ISR world) and Google (for everything else).

And so I am looking forward to the day in 2012 when it all dovetails back together and there is just one place to check my mailbox and calendar again.

Friday, November 25, 2011

TRAC: A3.2: Written policies and procedures

A3.2 Repository has procedures and policies in place, and mechanisms for their review, update, and development as the repository grows and as technology and community practice evolve.

The policies and procedures of the repository must be complete, written or available in a tangible form, remain current, and must evolve to reflect changes in requirements and practice. The repository must demonstrate that a policy and procedure audit and maintenance is in place and regularly applied. Policies and procedures should address core areas, including, for example, transfer requirements, submission, quality control, storage management, disaster planning, metadata management, access, rights management, preservation strategies, staffing, and security. High-level documents should make organizational commitments and intents clear. Lower-level documents should make day-to-day practice and procedure clear. Versions of these documents must be well managed by the repository (e.g., outdated versions are clearly identified or maintained offline) and qualified staff and peers must be involved in reviewing, updating, and extending these documents. The repository should be able to document the results of monitoring for relevant developments; responsiveness to prevailing standards and practice, emerging requirements, and standards that are specific to the domain, if appropriate; and similar developments. The repository should be able to demonstrate that it has defined "comprehensive documentation" for the repository. See Appendix 3: Minimum Required Documents for more information.


Evidence: Written documentation in the form of policies, procedures, protocols, rules, manuals, handbooks, and workflows; specification of review cycle for documentation; documentation detailing review, update, and development mechanisms. If documentation is embedded in system logic, functionality should demonstrate the implementation of policies and procedures. 



A deliverable for a recent contract was something the client called an information system security plan.  Our understanding was that in past contracts this was always understood to be a short document (2-3 pages) that summarized ICPSR's IT systems, and described the measures taken by ICPSR to protect them from unauthorized use.  No big deal, right?

However, .....

In this most recent contract the security plan implementation details changed; rather than a brief summary document, the requirement was now two-fold.

The first deliverable consisted of a document showing the Federal Information Processing Standards (FIPS) categorization of the risks associated with our IT systems.  This document was based on a standard known as FIPS Publication 199.  It turns out that this methodology and level of documentation is relatively lightweight.

In brief, one asserts one of three levels (Low, Medium, High) of risk.  There was never a question of asserting High risk, and so the choice as to select either Low or Medium.  We worked with the University of Michigan's central IT security office, and based on the type of data preserved at ICPSR, they recommended that we select Low.



The second part required us to document the security controls defined by the National Institute of Standards and Technology related to a FIPS-199 categorization of Low risk.  This standard is described in NIST Special Publication 800-53, and requires a very high level of documentation.  (The standard is very heavy on policy and documentation, but very light on measurement and audit, and therefore some critics believe that this is a major flaw in the approach.)

Our NIST 800-53 security control documentation ran nearly 200 pages(!), and this page-count does not include documents which are required, but external, to 800-53.  For example, if 800-53 requires one to assert that there is a policy on topic X, one does not need to include the policy within the 800-53 security controls documentation, but it does require one to write the policy on topic X (if it does not already exist).  And so between the 800-53 controls and the external documents, our guess is that this ran well over 250 pages.

And so we are very well supplied with policies and procedures, and we even have the documentation to prove it now.

Wednesday, November 23, 2011

InfoWorld Geek IQ Test - 2011

I took the 2011 InfoWorld geek IQ test.  I knew the answers to some of the more techie questions, especially when they were related to networking (CIDR, DNS), but didn't do so well on the pop culture items.  Got a 65 which places me between Geek dilettante and Marketing Executive.

I haven't decided yet whether I'm happy or ashamed.

Monday, November 21, 2011

Firing clients

Seth Godin has published another gem:  The Unreasonable Customer.

In this post he argues that while there are certain circumstances where maintaining a relationship with an unreasonable customer is justified, in many cases it makes no sense.  This is spot-on advice.

Some clients are demanding, of course, but some are demanding in very constructive, very actionable ways.  The client who pushes ICPSR, say, to deliver content in more interesting, more innovative ways may be difficult, but ultimately makes ICPSR a stronger organization with better services.

But the client who makes demands which are unreasonable, and which take resources away from better serving the other clients only weakens the organization.  Instead of making services better, the organization struggles hopelessly to appease the unreasonable client.  Resources and time are lost.  Staff become exhausted and disillusioned.  Morale sinks.

In the olden days of working in the telecom industry in the mid 1990s I remember a case where a train had derailed and it had torn up a bunch of fiber near the Washington, DC area.  A handful of our clients had consequently lost their network connections.  Our company was doing the right things:  We informed the clients about the problem, and we were keeping close watch on the fiber restoration project, pushing the supplier (I think it might have been MCI) to give our circuits the top priority.  While no one was delighted to be without their Internet connection, they understood that the cause was beyond our control, and that they had made the decision to purchase only a single Internet connection from a single company.  (Clients who needed very, very high availability would routinely purchase multiple Internet connections from multiple providers.)

One client, however, refused to let the team work through the problem.  This client wasn't interested in service restoration; this client wanted to take out all of his frustration on the team.  "You're incompetent!"  "You should all be fired!"  "This is unacceptable!"

I tried to calm the client.  Maybe we could set up something short-term over a dial-up line?  And maybe long-term the right solution is to have more than one Internet connection so that if another train derails (this seemed to happen way more than one would expect) or there is a natural disaster, you'll still have your Internet connectivity?

Nothing worked.  It was clear that this one client didn't want help; he wanted a punching bag.

So we fired him.

"You're right.  It sounds like we're just not the right provider for you.  We can't meet your expectations.  We won't waste any more of your time trying to restore your service.  We'll need you to send back the router, or we will have to bill you for it.  Best wishes, and good luck with your next provider."

That did more for morale than the last ten company picnics and holiday parties combined.

I honestly don't remember if we did end up firing the client, or if just the threat ended his hysterics.  But it definitely changed the relationship, and it proved to the team that we wouldn't let unreasonable people stop them from doing good work.

Friday, November 18, 2011

TRAC: A3.1: Designated community

A3.1 Repository has defined its designated community(ies) and associated knowledge base(s) and has publicly accessible definitions and policies in place to dictate how its preservation service requirements will be met. 

The definition of the designated community(ies) (producer and user community) is arrived at through the planning processes used to create the repository and define its services. The definition will be drawn from various sources ranging from market research to service-level agreements for producers to the mission or scope of the institution within which the repository is embedded.

Meeting the needs of the designated community—the expected understandability of the information, not just access to it—will affect the digital object management, as well as the technical infrastructure of the overall repository. For appropriate long-term planning, the repository or organization must understand and institute policies to support these needs.

For a given submission of information, the repository must make clear the operational definition of understandability that is associated with the corresponding designated community(ies). The designated community(ies) may vary from one submission to another, as may the definition of understandability that establishes the repository’s responsibility in this area. This may range from no responsibility, if bits are only to be preserved, to the maintenance of a particular level of use, if understanding by the members of the designated community(ies) is determined outside the repository, to a responsibility for ensuring a given level of designated community(ies) human understanding, requiring appropriate Representation Information.

The documentation of understandability will typically include a definition of the applications the designated community(ies) will use with the information, possibly after transformation by repository services. For example, if a designated community is defined as readers of English with access to widely available document rendering tools, and if this definition is clearly associated with a given set of Content Information and Preservation Description Information, then the requirement is met.

Examples of designated community definitions include:
  • General English-reading public educated to high school and above, with access to a Web Browser (HTML 4.0 capable).
  • For GIS data: GIS researchers—undergraduates and above—having an understanding of the concepts of Geographic data and having access to current (2005, USA) GIS tools/computer software, e.g., ArcInfo (2005).
  • Astronomer (undergraduate and above) with access to FITS software such as FITSIO, familiar with astronomical spectrographic instruments.
  • Student of Middle English with an understanding of TEI encoding and access to an XML rendering environment.
  • Variant 1: Cannot understand TEI
  • Variant 2: Cannot understand TEI and no access to XML rendering environment
  • Variant 3: No understanding of Middle English but does understand TEI and XML
  • Two groups: the publishers of scholarly journals and their readers, each of whom have different rights to access material and different services offered to them.
Evidence: Mission statement; written definitions of the designated community(ies); documented policies; service-level agreements. 



Documentation for this TRAC requirement can be found in ICPSR's mission statement (published on our web portal) and in our deposit agreements.

Wednesday, November 16, 2011

DuraCloud Archiving and Preservation Webinar

Shameless self-promotion alert...

The nice folks at DuraSpace have published the audio and video from the recent webinar that Michele Kimpton (CEO, DuraSpace) and I gave on DuraCloud.


Michele spends the first 5-10 minutes talking about the business case behind DuraCloud, and then I spend about 30 minutes talking about ICPSR and how we came to use DuraCloud to store a copy of our archival holdings.

Monday, November 14, 2011

A dangerous combination

Do you like irony?

It turns out that the University of Michigan, like many other organizations, has decided to use the cloud for keeping track of its "Travel and Expense" software and reporting, and has therefore adopted Concur.

I think that the University has made a good decision to put this in the cloud, and to look to use a hosted solution (Software as a Service (SaaS)).  Using an existing service makes much more sense than building our own software.  How could the U-M build a better application than a company that makes its living doing exactly this sort of thing?

Now, this isn't to say that I am a huge fan of Concur (or at least how it has been implemented at U-M).  I don't find the workflow or interface to be all that intuitive, and there are a couple of things that really trip me up all the time.  For example, when entering the name of someone, sometimes I am supposed to enter their LAST name and sometimes I am supposed to enter their FIRST name, and I can never remember which to enter.  (Cue sad music.)

But the really challenging part about using this cloud service is when I use it to pay for cloud services.  (Cue ironic music.)

Each month I get a bill from Amazon.  And DuraCloud.  And another one from DuraCloud (because we use more space than our membership allows.)  And another one from Amazon.  (Two different projects with different credit cards and different pools of machines.)  And Salesforce.  And....

So each month I print the invoice to PDF.  And I fetch the receipt from my university credit card, and PDF that too.  And then I bundle them together in an expense report in Concur.  And that's when the trouble starts:  How do I classify the expense?

This is almost certainly not the fault of the Concur software, of course.  The problem is in the controlled vocabulary of "expense types" that the U-M has plugged into the system.  Not one is a good fit for paying cloud providers.  And so I pick one from the choices I do have.

Computer maintenance?

Computer rental?

Memberships (especially for the DuraSpace one, which is indeed a membership)?

Other?

My expense report is reviewed by at least four different people (two within ICPSR, at least one within our parent organization, the Institute for Social Research, at at least one at the U-M central Business and Finance unit).  If any one of them believes that I have selected the wrong expense type, the report returns to me, and I then must resubmit it.  The good news is that I don't have to reload the invoice or receipt, and so the process is relatively simple.

But for those of you about to implement Concur or another expense and travel reporting system, please add a new expense category for your IT managers:  Cloud computing services.

Friday, November 11, 2011

TRAC: A2.3: Keeping ahead of the curve

A2.3 Repository has an active professional development program in place that provides staff with skills and expertise development opportunities.

Technology will continue to change, so the repository must ensure that its staff’s skill sets evolve, ideally through a lifelong learning approach to developing and retaining staff. As the requirements and expectations pertaining to each functional area evolve, the repository must demonstrate that staff are prepared to face new challenges.

Evidence: Professional development plans and reports; training requirements and training budgets, documentation of training expenditures (amount per staff); performance goals and documentation of staff assignments and achievements, copies of certificates awarded. 



ICPSR does a very good job organization-wide at encouraging staff to develop professionally.  This manifests itself in several different ways:  attending workshops and seminars; taking courses to learn new skills and abilities; and, most impressively, funding continuing education for staff who want to pursue a degree (often a graduate degree).  In the technology shop we take advantage of the generous professional development budget in many ways. 

The operations team (managed by Asmat Noori) is always bringing new technology into ICPSR:  new storage systems, new backup systems, new versions of software and operating systems, and more.  And so there is a recurring need for people to attend training workshops so that they can manage new technologies effectively and efficiently.  Further, some types of training - such as SANS security training - cuts across many different technologies, and needs to be renewed every few years, and the training budget also supports this type of activity.

The software development team (managed by Nathan Adams) also makes regular use of professional development.  In this case the new technology is usually a software system or development system component, not a new type of hardware.  And we will sometimes bring a trainer on-site to deliver a course to our entire team rather than sending people away to a class.  Nathan also has one staff member taking advantage of the tuition package offered by ICPSR, and is enrolled in the University of Michigan's School of Information Master's degree program.

Wednesday, November 9, 2011

ICPSR's Secure Data Environment overview

Jenna Tyson is a graphic artist on staff at ICPSR.  Over the fast few years Jenna has helped me out with displays for poster sessions, transforming the mediocre layout I produce with a true work of art.  I've posted some of her work here in the past.

I asked Jenna if she could create a logo for our Secure Data Environment (SDE), and above you can see the one that I liked best.  I leave it as an exercise to the reader to decide if the terrified individual in the picture is a defeated intruder or a frustrated ICPSR data curator.

The blog contains several posts that go into some detail about the software and security components behind the SDE, but I'm not sure that I ever posted a high-level description to set context, scope, and purpose.  And so along with Jenna's logo, I present the story behind the SDE.




The ICPSR Secure Data Environment (SDE) is a protected work area that uses technology and process to protect sensitive social science research data from accidental or deliberate disclosure.  The SDE exploits commonly used security technologies such as firewalls, ActiveDirectory group policies, and network segmentation to minimize unwanted access between the SDE and outside world.  Further, it takes advantage of work processes which require strict control of when data may be moved between the SDE and external locations.

Data enter the SDE through ICPSR's deposit system.  Depositors upload their content to a web application on our public web portal where it is encrypted.  An automated process "sweeps" content from the portal several times per hour, moving it to the SDE, where it is then unencrypted.  The content resides on a special-purpose EMC Network Attached Storage (NAS) appliance which services ICPSR's SDE.  The appliance uses private IP address space which is only routed within the University of Michigan enterprise network, and is also protected by a firewall.  Further, NAS shares are exported only to specific machines and only to specific ActiveDirectory groups.

ICPSR data managers must be located on the University of Michigan enterprise network to access the SDE.  (They may use the University of Michigan VPN client to access the network from remote locations, and this requires strong authentication and implements strong encryption.)  Data managers run a simple utility to "log in" to the SDE.  Once logged into the SDE they are assigned to a disposable virtual Windows 7 desktop system which is configured to persist any content on the ICPSR SDE NAS.  Any content stored on the virtual desktop system is destroyed once the image is terminated.

Data curators are not allowed to access the Internet or email within the SDE, and they do not have access to local system ports (e.g., USB).  Clipboards are NOT shared between the SDE and the local machine, and so there is no ability to "cut and paste" between the two environments.  It is possible, of course, for data curators to take notes from what they see on the screen, and to take screen snapshots, but ICPSR management considers these to be acceptable risks.

Data curators may release data from the SDE via two mechanisms.

One, they may submit completed work for release via an internal work system called turnover.  This queues material for placement in archival storage, and also queues related material for release on the web site.  A release manager reviews all content before allowing it on the web site.

Two, they may submit unfinished work for transfer outside of the SDE.  In this case a request appears in the inbox of the data curator's supervisor who may then review the request, and then accept or reject it.  If accepted the content is available to the data curator through a simple file retrieval mechanism, and the transfer is logged.

ICPSR has contracted the services of a "white hat" ethical hacker to assess the security vulnerabilities on the SDE.  ICPSR has already implemented small changes within the SDE based on preliminary reports from the contractor.

Monday, November 7, 2011

October 2011 deposits at ICPSR

Time again for the monthly report of new deposits at ICPSR.  Here is the snapshot from October 2011:


# of files# of depositsFile format
43application/msaccess
463application/msoffice
249631application/msword
24157application/octet-stream
48945application/pdf
9316application/vnd.ms-excel
61application/vnd.wordperfect
152application/x-dbase
21application/x-empty
113014application/x-sas
186731application/x-spss
119314application/x-stata
11image/gif
11image/jpeg
22image/png
41image/tiff
11image/x-photoshop
105message/rfc8220117bit
1512text/html
22text/html; charset=us-ascii
1148text/plain; charset=iso-8859-1
508text/plain; charset=unknown
504733text/plain; charset=us-ascii
671text/plain; charset=utf-8
43text/rtf
667text/x-c++; charset=us-ascii
11text/x-c++; charset=utf-8
2117text/x-c; charset=us-ascii
986text/xml

The volumes are a bit higher this month, especially the number of files.  At least some of the deposits must have been large, containing an unusually large number of files.

In addition to the usual suspects - plain text, stat package formats, MS Word, PDF - we have a very large number of unidentified files this month (2400+ application/octet-stream), and we also have a very small number of interesting formats (images, photoshop).

Friday, November 4, 2011

TRAC: A2.2: The right quantity of staff and skills

A2.2 Repository has the appropriate number of staff to support all functions and services.

Staffing for the repository must be adequate for the scope and mission of the archiving program. The repository should be able to demonstrate an effort to determine the appropriate number and level of staff that corresponds to requirements and commitments. (These requirements are related to the core functionality covered by a certification process. Of particular interest to repository certification is whether the organization has appropriate staff to support activities related to the long-term preservation of the data.) The accumulated commitments of the repository can be identified in deposit agreements, service contracts, licenses, mission statements, work plans, priorities, goals, and objectives. Understaffing or a mismatch between commitments and staffing indicates that the repository cannot fulfill its agreements and requirements.

Evidence: Organizational charts; definitions of roles and responsibilities; comparison of staffing levels to commitments and estimates of required effort.



This is an interesting question for an organization like ICPSR.  My colleague Nancy McGovern mentioned something the other day:  She noted the difference between digital preservation (where ICPSR spends some of its resources, but not the lion's share) and data curation (where we spend a significant quantity of resources).

My sense is that the effort required to perform a base level of digital preservation on our content - plain text survey data and PDF- and XML-format documentation - is relatively small, and even if ICPSR found itself operating on a minimal budget without any of its topical archives or special projects, there would be an adequate number of staff to manage the archival holdings, review fixity reports, and execute migrations of content from format to format, or from location to location.

In our present configuration, we can see a close correlation between required effort and resources.  This most often manifests itself as line-items in individual project budgets.  But it also shows up on the organizational chart when one sees specific organizational units at ICPSR which have a clear digital preservation mission or component.

Wednesday, November 2, 2011

ICPSR Web availability through October 2011

Ick.

This is not a good trend.

Our overall availability (i.e., all components are working properly) sank below 99.5% again in October.  The main culprit was a nearly two hour period on October 12, 2011 when a series of common alerts turned out to have an uncommon cause.  The oncall systems engineer went through our usual series of steps to bring the service back online, and while the steps seemed to help at first, it was clear that the fix was just temporary, and more diagnostic work was necessary.  This series of events also happened at an inopportune time, just as many of us were in transit between the office and home (and then back to the office again).

We also had a problem with our search engine technology (Solr) late in the month, and that contributed another 46 minutes to our unavailability.  (Other components were working fine, but search was not.)

My apologies to those of you who were trying to get some work done on our site last month, and got bit by either of these problems.