Technology at ICPSR: June 2009

Tuesday, June 30, 2009

Fedora Land Speed Record?

I don't know if there is an existing "land speed record" for putting the number of Datastreams into a single object into Fedora, but we have ingested a data object with nearly 17k Datastreams into our Fedora repository. To be honest I was not sure that Fedora would be able to handle it, but it did.

This particular object is related to ICPSR Study 13517, and the Datastreams are largely Census 2000 data files that we pulled from the web site of the Census Bureau in 2003. Overall the Datastreams consume a bit over 4GB of disk space, and their corresponding objects (in FOXML format) use only about 10MB of space. They have very little metadata in the in-line DC and RELS-EXT Datastreams, for example.

We're still working on creating the "keepsake" objects I described in an earlier post, but if Fedora can handle this number of Datastreams, there shouldn't be any problems with other less massive studies.

One small complaint: The Fedora admin tool (fedora-admin.bat on the Windows platform) does not do a very good job with objects that have a large number of Datastreams. Because of the way it tiles the Datastreams and lacks scroll bars, it is pretty much useless for something of this scale. The "out of the box" web services, however, do a fine job displaying the object.

Monday, June 29, 2009

ICPSR Web Services outage

The ICPSR web site and the CCEERC web site were unavailable from 4:27am EDT until 5:04am EDT on Sunday, June 28, 2009. This was an unscheduled outage.

The proximate reason for outage was that the DNS software on the ICPSR DNS server faulted at 4:24am EDT, and stopped answering queries. This had the unexpected side-effect of reducing the response time on the ICPSR and CCEERC web sites to the point where the sites stopped serving requests.

ICPSR IT staff were paged at 4:25am EDT, and began diagnosing the situation shortly thereafter. By 5:00am EDT it was clear that the root problem was not with the web server itself, and staff isolated the problem to the DNS server at that time.

Our apologies for any inconvenience this may have caused.

Tuesday, June 23, 2009

Storing Keepsakes

ICPSR has been around for a long time. A really long time. Fifty years. And archive years are like dog years, so this is really old.

During my nearly seven years at ICPSR I've come to the conclusion that we've collected and stored pretty good metadata about most of our electronic wares. For the past few decades we've used consistent buckets to hold information: this came from a depositor; this we made ourselves; this we derived programmatically from an original source elsewhere; etc.

We have rows and rows and columns and columns of metadata stored away in relational databases; and more recently, we also have a formal tracking system with all of the important events and dates in the history of a study. And in the near future I expect that we'll transition all of this well documented content and metadata into a Fedora repository. This is the good stuff.

But then there is the older stuff.

The stuff of questionable value. The stuff that no one else wants. The stuff that we'll keep anyway.

The keepsakes.

The digital keepsakes.

Before the end of the summer we'll move our digital keepsakes from their current home, and put them into Fedora. We won't need anything terribly sophisticated in terms of Fedora's CMA. We'll create a tool that builds a nice Fedora object (in FOXML) for each of ICPSR's current and former studies, and creates a keepsake object for each.

Monday, June 15, 2009

It's that time of year again....

It's June and at ICPSR that means it's nearing the end of another quarter and also the end of the 2008-2009 fiscal year. And that means it's time to start writing formal performance reviews, and to create the formal work plans for next year.

I've worked at a lot of different places, managed many different types of technology groups, and used different mechanisms for this activity. At our start-up software company, almost nothing was written down; at the UUNET portion of MCI Worldcom the process was surprising lightweight - everyone was just a number in a spreadsheet; and at ICPSR the process is more heavyweight.

Each had it's virtues and vices, but a heavyweight process requires more writing, and more careful thinking. If one is distilling performance over the past year into a digit between 1 and 4, then the "writing" aspect it pretty easy; if one is writing a handful of paragraphs about the person's performance over that same time period, it takes a lot more time and thought.

There are a lot of different ways to measure performance, but for my money the best performers demonstrate their value to the organization in the following seven ways. Sometimes you'll have that rare person who shows four, five, maybe all seven of these characteristics, and my bet is that they're your top performer.

Quantity. Whether its troubleshooting network problems, solving desktop or server issues, or writing code, the top performers are the people who get a lot of stuff done. They don't sacrifice quality for quantity, and they do the little things right like commenting their code or tickets, and making sure they've left behind a good trail that others can follow. When they take a vacation the rest of the team groans because they know they'll have to pick up a ton of slack.

Quality. These are the people that write the applications that look great and work just like you'd expect them to work. Their stuff is good, and intuitive to use. They are the people that really come to understand the appropriate technology and standards, and they solve the problem once and for all. They don't use a band-aid when major surgery was the necessary solution. (Or vice-versa.) Once they fix it or write it, it just works.

Flexibility. Knowing one thing well is useful. Knowing one thing really, really well is really useful. Knowing a lot of different things reasonably well is gold. Having the attitude to say, "Well, I don't know how to do that using X, but I'll go find out." If I can only give a certain kind of project to someone on the team, then their usefulness and ability to contribute is really constrained. If I can throw almost anything at them, then their value to me is huge.

Knowing the business. Someone who really understands what the business is, how it makes its money, how its key processes operate, and how the major pieces fit together is essential for certain types of problem-solving. These are the people who can tell that something isn't quite right, and they dig into the business and people processes to find out what's really wrong. And then they fix that thing, not the symptom. It takes a while for new staff to really get good at this, but it is a critical area for their growth and development. It keeps them in the core of the business, and makes them a partner in its success.

Teaching. Key staff are good teachers. They teach themselves new technologies. They teach their peers new ways of solving old problems. They teach their managers about new developments in technology and problem-solving. They email you links to articles or blog posts or book reviews that are interesting and timely. They teach their colleagues and customers why something broke, and how they fixed it. They learn stuff well enough that they can teach others about it, and they have the patience and skills to explain things in a lucid, disarming manner. They are like MVP point guards in basketball: they make the whole team better.

Writing. Good writers help the whole organization. They write clear comments in code and tickets. They post illuminating and interesting content in blogs, wikis, and support forums. They put together good information to share with the boss (theirs and mine). They can put together those "all organization" emails that explain clearly and concisely what sort of maintenance is going to happen next week at 6am, or why the server blew up yesterday at noon. They don't just forward that long email; they annotate it with two sentences that deliver the punchline, inviting you to read the whole thing if you want to see the details. You make time to read their long emails because you know it's long for a good reason.

Asking for help. No one knows everything about everything. There is always a time when someone gets stuck with a problem, whether it's interpersonal, technical, political, or other. The top people know when they've stopped making progress and are spinning their wheels. Instead of waiting for the boss to notice, poke into the situation, and then call for relief, they contact the boss first. Or one of their colleagues. Or the customer. The point is that they know when to reach out to others in order to finish the job. Their ego is strong enough that they aren't afraid to say "I don't know."

Friday, June 12, 2009

Virtually a meeting

Just finished up the first ever virtual ICPSR Council meeting.

We used Adobe Connect as the technology, and scheduled the committee meetings last week, and the overall general session this afternoon. Asmat Noori (the assistant CNS director for technology operations) led the effort, and sat in on most of the meetings in case anyone needed technical support. So far the feedback has been pretty positive, and I expect we'll use this technology again in the future.

My own experience is that it works pretty well when the people already know each other, as we do in this case. And it also works well when a lot of the meeting is devoted to reporting on issues and projects v. intense, active collaboration to solve a pressing problem.

It certainly saves our Council members time; a couple of one- or two-hour phone calls over a two week period v. air travel to Ann Arbor for a two-day meeting. And, of course, there is a significant cost savings to ICPSR's membership since the cost of using Adobe Connect is about the same as the travel expenses for a single attendee.

I don't think we've seen the end of in-person Council meetings, but my prediction is that we'll see a blending of virtual and physical meetings in the future.

Thursday, June 11, 2009

Cloudy day for the cloud?

I use Google Alerts to keep tabs on a variety of people, places, and things of all sort, and something interesting hit my Gmail inbox today about "Amazon Web Services":

Looks like Amazon Web Services’ Elastic Compute service went down for an extended period this evening.

Now this was news to me, especially since we host our production study search service in the AWS EC2, and our replica delivery infrastructure too. Both the University of Michigan Network Operations Center (NOC) and Merit (NOC) monitor our systems, testing availability every minute of every day. And whenever there is an outage, the on-call engineer gets a page (or many pages!). And I'm the oncall this week. :-)

So what really happened?

Here's a piece of the story from the AWS Service Health Dashboard:

7:33 PM PDT We wanted to give you a quick update. A lightning storm caused damage to a single Power Distribution Unit (PDU) in a single Availability Zone. While most instances were unaffected, a set of racks does not currently have power, so the instances on those racks are down. We have technicians on site, and we are working to replace the affected PDU. We do not yet have an ETA, but we expect to be able to recover the instances when we restore power. Besides these affected instances, all other instances, and all other Availability Zones, are operating normally. Users with affected instances can launch replacement instances in any of the US Region Availability Zones or wait until their instance(s) are restored.

Some instances in one of AWS's availability zones (e.g., they have three for the US alone) failed. That's not wonderful news, especially if one of the failed instances belongs to you, but it is hardly a failure of the entire EC2 service.

To me this is somewhat like ICPSR messing up an entry in a database, making one of our studies unavailable by mistake, and someone blogging that ICPSR's on-line delivery service went down.

Net net for me: Like with any story with a sensational headline, one always has to read the body of the text to get the real story. And preferably, read the story from various sources to triangulate on the reality of the situation.

Saturday, June 6, 2009

The future of MyData

When I arrived at ICPSR back in late 2002, we didn't have any sort of system to issue identities or authenticate visitors to the web site. At that time we asked users to type in an email address, performed some basic sanity-checking (like does it contain an '@'?), and then showed them our terms of use if no one had ever entered that email address before. Simple. But not terribly useful.

We launched our identity and authentication service, MyData, in 2004, and it has been busy collecting identities since then. I just ran a query, and at this moment, we have 168,524 identities. We ask new account holders to give us just a little personal information, such as how they fit into the world of social science (student, faculty, policy-maker, ...? economics, political science, sociology, ...? favorite stat package?). It's great.

But I always knew that having our own local identities with our own proprietary software was not going to be the best solution in the long run. For a very long time I've been waiting for Internet2's Shibboleth project to really hit the tipping point. And while it does seem to be gaining acceptance in higher ed, it still feels like it is moving pretty slow, and it isn't at all clear that it will hit the tipping point this year. Or next year. It's hard to tell.

Then, a year or two I came across OpenID. Unlike Shib where the institution is the identity provider, OpenID is an open, decentralized way for individuals to create and manage identities, and to use that identity across multiple web sites (i.e., Single Sign-On). I created an OpenID back then, but it didn't seem like many sites were supporting OpenID for authentication.

Now it's 2009. Google, Yahoo, AOL, Facebook, etc., etc. are all supporting OpenID authentication. Lots of ICPSR's newest users are graduate students and recent graduates, and I'll bet many of them already have a Facebook account. Or Google account. Or Yahoo account. And I'll bet many of them would be just as happy to use that account to identity themselves to the ICPSR web site rather than creating yet another identity on our site.

We're refreshing the web site in August, and a change in authentication and identity services isn't on the short list. And so for better or worse everyone will be able to use a MyData account to login to ICPSR in September 2009. But I'll be really surprised if people will be using a MyData account to login when we roll our our next web site refresh in 2011.