Friday, December 25, 2009

TRAC: C1.9: Change testing

C1.9 Repository has a process for testing the effect of critical changes to the system.

Changes to critical systems should be, where possible, pre-tested separately, the expected behaviors documented, and roll-back procedures prepared. After changes, the systems should be monitored for unexpected and unacceptable behavior. If such behavior is discovered the changes and their consequences should be reversed.

Whole-system testing or unit testing can address this requirement; complex safety-type tests are not required. Testing can be very expensive, but there should be some recognition of the fact that a completely open regime where no changes are ever evaluated or tested will have problems.

Evidence: Documented testing procedures; documentation of results from prior tests and proof of changes made as a result of tests.

When I arrived at ICPSR in 2002 the same systems were used for both production services and for new development. In fact, the total number of systems at ICPSR was very small; for example, a single machine was our production database machine, our shared "time sharing" machine for UNIX-based data processing, our anonymous ftp server, and pretty much everything else.

Our model is quite different these days. Most of our software development occurs on the desktop of the programmer, and new code is rolled out to a staging server for testing and evaluation. The level and intensity of the testing varies widely at ICPSR; my sense is that public-facing systems get the most evaluation, and internal-only systems receive very little. Nonetheless, unless the evaluation reveals serious flaws, the software is then rolled into production on a fixed deployment schedule. Because all software resides in a repository, we can back out changes easily if needed.

The last six software developers we've hired have all worked in the Java environment, and we're in the process of moving our Perl/CGI team to Java as well. My sense is that getting all of the major systems in Java will make it easier to use unit-testing tools, like JUnit for example.

Monday, December 21, 2009

CNI Fall 2009 Membership Meeting

I gave a project briefing at the Coalition for Networked Information (CNI) Fall 2009 Membership Meeting on ICPSR's work on cloud computing and confidential research data. I have placed a copy of the presentation deck at SlideShare.

Most of the talks I attended were quite good, and the brief notes I've entered here are by no means complete summaries. But they will give people some flavor of the meeting, and the types of topics that one will find at a CNI meeting. I should note that I really find the meetings useful; it is a great way to keep up with what's going on at the intersection of IT, libraries, and data, and I usually meet several interesting people.

Opening Plenary - Overview of the 2009-2010 Program Plan (Cliff Lynch) - As usual Cliff opened the meeting and went through the CNI Program Plan for 2009-2010, hitting a wide array of topics including open data, open access, the financial crisis in higher education (particularly in the UC system), sustainability, linked data, the contrast between the centralized databases of the 70s, 80s, and 90s v. more diffuse collections today, and reaching deeper into membership organizations.

He drew a distinction between data curation (focus on re-use and the lifecycle) and data preservation (focus on long-term retention). My recollection is that he thought the former was more likely to attract community engagement, and the latter was a tough sell to funders, membership organizations, and business. I've heard others make similar comments, most recently Kevin Schurer from the UK Data Archive, who distinguished between research data management and data preservation.

Cliff then spoke about the usefulness of attaching annotations to networked information, perhaps in reference to a talk (which I wasn't able to attend) from the Open Annotation Community project later in the day.

Thorny Staples and Valerie Hollister gave a brief talk about DuraSpace's work to faciliate "solution communities" to help people solve problems using Fedora Commons and/or dSpace.

Randy Frank gave Internet2 kudos for creating good tech for demos and labs, but told the audience in his project briefing that he wanted to bring the tech closer to the production desktop at member institutions.

Simeon Warner described how arXiv would be soliciting its top downloaders for donations to help keep the service running. It's current host, Cornell University Library, spends about $400k per year (largely on people) for the service, and naturally they would like to find others to help pay for this community service.

Friday, December 11, 2009

TRAC: C1.8: Change management

C1.8 Repository has a documented change management process that identifies changes to critical processes that potentially affect the repository’s ability to comply with its mandatory responsibilities.

Examples of this would include changes in processes in data management, access, archival storage, ingest, and security. The really important thing is to be able to know what changes were made and when they were made. Traceability makes it possible to understand what was affected by particular changes to the systems.

Evidence: Documentation of change management process; comparison of logs of actual system changes to processes versus associated analyses of their impact and criticality.

Establishing and following a change management process is a lot like stretching before working out. We know we should do it, we feel better when we do, but, wow, is it ever hard to make a point to do it.

I've used a change management process in the past. At ANS Communications we would only make major configuration changes to the network on certain days and at certain times, and we would drive the changes from a central database. This was particularly important when we made a long and very painful transition away from the NSFnet-era equipment that had formed our backbone network to new equipment from a company known as Bay Networks.

At ICPSR I think we could use something fairly straight-forward. There are only a handful of critical software systems, and they don't change that often. We already track software-level changes in CVS, and we already announce feature-level changes to the designated community (e.g., ICPSR staff for internal systems), and so we might pull it all together by linking the announcements with the code changes in JIRA. I could also imagine a thread on our Intranet (which is Drupal-based) which could form a central summary of changes: what, when, how, and links to more details.

Monday, December 7, 2009

A Newbie's Guide to Serving on an NSF Panel

I had the opportunity to serve on a National Science Foundation review panel a while ago. It was quite an undertaking to read through many different kinds of proposals, re-read them to the point of really understanding what the proposal author was saying, and then to distill it down to a brief summary, the strengths and weaknesses, and how the proposal does (or doesn't) address the goals of the program well. Pretty exhausting!

But... It was great; very, very interesting to go through the process, see how the system works, and participate in the system. Having done it once, I'd love to do it again. And it certainly gives one a very fine appreciation for what to do (and what to avoid) in a proposal so that the job of the reviewer is made easier. Even something seemingly simple, like page numbers, ends up being pretty important.

The details of the actual panel (the specific panel, program, and set of reviewers), and the content of the proposals need to be kept confidential. However, I kept thinking "Hey, I wish I knew about that before I started on this." And so I offer my list of eleven things you should know, do, or not do if you are about to serve on your first review panel.
  1. Do not print the proposals. You won't need them at the NSF, and you'll just need to shred them later. Everything you'll need is on-line in the Interactive Panel system. And there may not be much space at the panel to keep them hard-copies handy.
  2. Do not bring a laptop. The nice folks at the NSF supply a laptop with a wired network connection, and convenient desktop icons to all of the stuff you'll need. If you bring your own laptop, you'll need to have the NSF IT guys scan it before they'll allow it to connect to the network. And it won't have all of the convenient desktop icons. (All this said, I still brought my little HP Mini to use at the hotel.)
  3. Bring water. The government doesn't believe in bottled water, and if you like to drink plenty of the water during the workday, you'll wish you had ready access to some bottles.
  4. Being a Scribe is a lot more work than a Lead. Reading and reviewing proposals is a lot of work. For some, you'll also be the Lead. That's actually not a big deal at all; it just means that you have to give a brief summary of the proposal to the other reviewers, and hit its strengths and weaknesses. Ideally you'll also drive the conversation around the proposal, but the NSF program officers are there to help out, and they are really good at this sort of thing. Now, for some proposals, you may be a Scribe, and that is a lot of work since it's up to you to take the minutes of the conversation, generate a nice summary that reflects all the key evaluation points, and then coax the rest of the reviewers to read and approve your work.
  5. Invest plenty of time in your reviews. Take your time. Write good, clear prose. But be succinct. Make it easy to read so that your other reviewers can read through it quickly. You'll be glad you took the time to do this when you're the Lead. And the other reviewers will also thank you for it.
  6. Keep your receipts. The government will be paying you via a 1099, and you'll want to deduct the expenses from your income.
  7. Be sure to select the box on FastLane to disable the automatic screen refresh. If you don't do this, the system will refresh the screen on you at the most inopportune times.
  8. Submit your reviews early. The program officers and other reviewers will be able to factor in your comments if they have them before the panel. Definitely do not wait until you get to the panel. And be sure that you use the Submit Review button, not Save Review button, once your review is ready.
  9. Be very clear about the review criteria. In addition to any standard criteria, they may also be program-specific criteria, and perhaps additional specific areas which will require extra attention.
  10. The Hilton Arlington is very, very close to the NSF. There is even a covered, overhead pedestrian walkway from the Hilton to the Stafford Building. This can be very nice if it is raining.
  11. The NSF is at 4201 Wilson Blvd. It is in a large, shared office building called Stafford Place. I'm not sure that I ever found the street address in any convenient page on the NSF Visitors page.