Technology at ICPSR: January 2010

Friday, January 8, 2010

TRAC: C2.1: Appropriate hardware technologies

C2.1 Repository has hardware technologies appropriate to the services it provides to its designated community(ies) and has procedures in place to receive and monitor notifications, and evaluate when hardware technology changes are needed.

The repository needs to be aware of the types of access services expected by its designated community(ies), including, where applicable, the types of media to be delivered, and needs to make sure its hardware capabilities can support these services. For example, it may need to improve its networking bandwidth over time to meet growing access data volumes and expectations.

Evidence: Technology watch; documentation of procedures; designated community profiles; user needs evaluation; hardware inventory.

For the types of material that ICPSR preserves and delivers to its clients, hardware selection has been relatively unimportant since the birth and promulgation of the Web.

In ICPSR's early days, when it would send magnetic tapes to its member schools for use on their own campus, hardware selection, particularly in terms of media types, was a very important consideration. If most campuses were using, say, IBM labeled cartridge tapes routinely but ICPSR was sending ANSI labeled 9-track tapes, this would have imposed an additional burden on our designated community. However, in a world where the typical access is via a Web download, the underlying hardware selection - disk, server, router, switch - just doesn't matter all that much.

Also, because ICPSR's unit of delivery is relatively small - most of our content is on the order of megabytes - issues such as network bandwidth are not critical. In a typical day ICPSR might deliver 20GB of content to hundreds or even thousands of separate sessions, and so with even modest networking resources (1 Gb/s NIC, 1 Gb/s switch, 10 Gb/s campus backbone, etc), the "pipes" are more than sufficient to deliver good service.

One interesting area to consider with regard to hardware selection is virtualization. For example, in some cases it might make more sense for a depositor to leave ICPSR with an image of a complete computing system in addition to a dataset. In most cases the deposit contains simple rectangular data that can be normalized relatively easily into plain character data and accompanying "setups" to make it easy to use with the most common statistical packages. However, if the dataset were more complex - say a relational database - then normalizing the data could be prohibitively expensive, or damaging to the data, or both. In this case, having a machine image that would contain a host operating system, the database application, and the user database might be a useful addition both for preservation and delivery.

Tuesday, January 5, 2010

TRAC: C1.10: Patching and risk-benefit assessment

C1.10 Repository has a process to react to the availability of new software security updates based on a risk-benefit assessment.

Decisions to apply security updates are likely to be the outcome of a risk-benefit assessment; security patches are frequently responsible for upsetting alternative aspects of system functionality or performance. It may not be necessary for a repository to implement all software patches, and the application of any must be carefully considered. Each security update implemented by the repository must be documented with details was about how it is completed; both automated and manual updates are acceptable. Significant security updates might pertain to software other than core operating systems, such as database applications and Web servers, and these should also be documented.

Evidence: Risk register (list of all patches available and risk documentation analysis); evidence of update processes (e.g., server update manager daemon); documentation related to the update installations.

Like many organizations ICPSR uses an automated system for basic security patching, both on the Windows platform and on the Linux platform. In this context there is absolutely no risk assessment for an individual patch; they flow from a trusted source, and are installed automatically.

For our most sensitive components, such as the Apache web server, Oracle, and the kernel we run on our Linux systems, we install patches and/or upgrade versions manually. These changes take place less often, and at a minimum are announced well ahead of time to the entire group. For example, a systems administrator would announce an intention to upgrade to the newest release of the Apache web server on our staging server, see how things work for a week, and then announce the same upgrade on our production web server. This isn't captured in anything as formal as a risk register, but there is clear and frequent communication about the change.