Google+ Followers

Friday, January 8, 2010

TRAC: C2.1: Appropriate hardware technologies

C2.1 Repository has hardware technologies appropriate to the services it provides to its designated community(ies) and has procedures in place to receive and monitor notifications, and evaluate when hardware technology changes are needed.

The repository needs to be aware of the types of access services expected by its designated community(ies), including, where applicable, the types of media to be delivered, and needs to make sure its hardware capabilities can support these services. For example, it may need to improve its networking bandwidth over time to meet growing access data volumes and expectations.

Evidence: Technology watch; documentation of procedures; designated community profiles; user needs evaluation; hardware inventory.

For the types of material that ICPSR preserves and delivers to its clients, hardware selection has been relatively unimportant since the birth and promulgation of the Web.

In ICPSR's early days, when it would send magnetic tapes to its member schools for use on their own campus, hardware selection, particularly in terms of media types, was a very important consideration. If most campuses were using, say, IBM labeled cartridge tapes routinely but ICPSR was sending ANSI labeled 9-track tapes, this would have imposed an additional burden on our designated community. However, in a world where the typical access is via a Web download, the underlying hardware selection - disk, server, router, switch - just doesn't matter all that much.

Also, because ICPSR's unit of delivery is relatively small - most of our content is on the order of megabytes - issues such as network bandwidth are not critical. In a typical day ICPSR might deliver 20GB of content to hundreds or even thousands of separate sessions, and so with even modest networking resources (1 Gb/s NIC, 1 Gb/s switch, 10 Gb/s campus backbone, etc), the "pipes" are more than sufficient to deliver good service.

One interesting area to consider with regard to hardware selection is virtualization. For example, in some cases it might make more sense for a depositor to leave ICPSR with an image of a complete computing system in addition to a dataset. In most cases the deposit contains simple rectangular data that can be normalized relatively easily into plain character data and accompanying "setups" to make it easy to use with the most common statistical packages. However, if the dataset were more complex - say a relational database - then normalizing the data could be prohibitively expensive, or damaging to the data, or both. In this case, having a machine image that would contain a host operating system, the database application, and the user database might be a useful addition both for preservation and delivery.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.