Friday, February 26, 2010

The Big Switch

Quite some time ago now I read Nick Carr's The Big Switch.

The basic premise is that just as we saw organizations move from producing their own electricity to buying electricity from a grid, we'll see this same dynamic play out in technology. Carr predicts that most organizations will stop buying and deploying large farms of computational power and storage, and will instead rely upon the "cloud" for their technology infrastructure instead. Carr also predicts that this shift will unleash new opportunities, markets, and challenges for business and the pubic.

Here's an even better summary of the book from Carr's web site:

A hundred years ago, companies stopped generating their own power with steam engines and dynamos and plugged into the newly built electric grid. The cheap power pumped out by electric utilities didn’t just change how businesses operate. It set off a chain reaction of economic and social transformations that brought the modern world into existence. Today, a similar revolution is under way. Hooked up to the Internet’s global computing grid, massive information-processing plants have begun pumping data and software code into our homes and businesses. This time, it’s computing that’s turning into a utility.

The shift is already remaking the computer industry, bringing new competitors like Google and to the fore and threatening stalwarts like Microsoft and Dell. But the effects will reach much further. Cheap, utility-supplied computing will ultimately change society as profoundly as cheap electricity did. We can already see the early effects — in the shift of control over media from institutions to individuals, in debates over the value of privacy, in the export of the jobs of knowledge workers, even in the growing concentration of wealth. As information utilities expand, the changes will only broaden, and their pace will only accelerate.

My sense is that Carr's right, and the shift is indeed already underway. For example, rather than buying storage devices - such as a NAS - and operating it locally at ICPSR, I'm much more inclined to rent storage from a provider. I'll only buy storage in those cases where there is some unique combination of requirements such that I have no choice but to purchase it, and (this is the expensive part) operate it myself.

As another example I see us continuing a trend of sourcing public-facing computational and storage resources as cloud instances rather than locally provisioned machines (or even virtual machines). I don't view virtualization as a great new technology that I can use to save money in running my machine room; I view it as a great new business that I can use to save money by getting rid of my machine room.

A similar switch or shift is also taking place with software systems. Our general strategy is to select existing technologies for delivering or managing content wherever feasible, and to spend our resources on software development only when there are no reasonable alternatives available.

Authoring and archiving announcements on the web site? Blogger.

Data leads tracking and management?

Delivering basic static web content with simple search for our sites that have no data? Drupal.

The real promise of this switch is in the potential it offers. Rather than using the IT shop as a simple service provider which automates business processes and therefore the transaction costs of operating ICPSR, it gives the IT shop the opportunity to be a partner in business process innovation. And when organizations actually reinvent themselves and change their fundamental business processes, that's when really interesting things can happen.

Tuesday, February 23, 2010

TRAC: C3.3: Roles, responsibilities, and authorization

C3.3 Repository staff have delineated roles, responsibilities, and authorizations related to implementing changes within the system.

Authorizations are about who can do what—who can add users, who has access to change metadata, who can get at audit logs. It is important that authorizations are justified, that staff understand what they are authorized to do, and that there is a consistent view of this across the organization.

Evidence: ISO 17799 certification; organizational chart; system authorization documentation.

ICPSR is a very mature organization, and experiences very low staff turnover and very little of the reorganizations, mergers, and other organizational changes one finds in the for-profit world. Roles and responsibilities are known well, and the knowledge is common across work groups. We don't often hire completely new types of positions, and so when we do hire a new employee, they tend to slot easily into an existing category of employee (e.g., "data processor").

The combination of a category, or role, and the specific work group - such as, the new hire, Jane, is a Data Processor in SAMHDA - translate easily into a set of related technology containers such as Active Directory groups, LDAP groups, old school UNIX /etc/group entries, and even our own custom database that maps people to groups.

My sense is that the net effect is that we have roles that are well understood, but not crisply defined; and we have containers to hold the roles, but those containers are as idiosyncratic as the collection of technology we use to drive ICPSR.

Sunday, February 21, 2010

ICPSR Web Server Outage

Some of our very hard working colleagues and clients noticed that our web server was unavailable between about 10:30pm on February 20 until just about 12:15am on February 21. The root cause of the problem was that the network switch serving our database server failed, and that causes significant problems for the web server.

After fielding the page and driving into the office, the on-call member of my team discovered that the web server was indeed up and running, but not answering any connections from the network. She then diagnosed the problem, discovering that the network interface on the server was OK, but the network switch was not. Some quick work finding a long cable and a free network port elsewhere in the machine room solved the problem.

Our apologies for the inconvenience this caused.

Friday, February 19, 2010

TRAC: C3.2: Security controls

C3.2 Repository has implemented controls to adequately address each of the defined security needs.

The repository must show how it has dealt with its security requirements. If some types of material are more likely to be attacked, the repository will need to provide more protection, for instance.

Evidence: ISO 17799 certification; system control list; risk, threat, or control analyses; addition of controls based on ongoing risk detection and assessment.

Broadly speaking ICPSR has only two types of objects to manage: (1) those that have been reviewed and vetted, and are available for wide dissemination, and (2) everything else.

The first category of objects require relatively modest security controls. Because this content is freely available via our web site (albeit some only to members) there are relatively few concerns about its security. It's important that we know that the objects haven't been corrupted (unintentionally or otherwise), and it's also important to ensure that member-only content is accessible only to members.

The second category has grown considerably in the past few years, particularly as ICPSR receives more confidential data through government agencies, partners, and unsolicited via its deposit system.

To meet the rising challenge of confidential data ICPSR will be changing the architecture of its network, and will also be deploying a Secure Data processing Environment. (More on the SDE in a subsequent post.) Both of these steps supplement existing security controls that we discussed in the post The ICPSR Web Site and Security from September 2009.

Our current network architecture is very flat. All network-connected equipment is part of a single virtual local area network (VLAN), and items are protected at the border between ICPSR and the outside world via Cisco access-list policies. Individual systems may also be protected via local firewalls, such as iptables on a Linux server. And equipment located in Amazon's cloud is protected via similar mechanisms using AWS EC2 Security Groups.

Our new network architecture will look very different, and will consist of four separate VLANs. Here is the overall picture:

This new architecture makes use of a new security service provided by the University of Michigan's central IT organization, ITS. The service is called Virtual Firewall, and is based on the Checkpoint firewall product.

This architecture divides ICPSR's network-connected resources into four VLANs:
  1. Public
  2. Semi-private
  3. Private (local to ICPSR)
  4. Private (local to ITS)
The Public VLAN hosts ICPSR's public-facing resources, such as the web server. This is a relatively small VLAN, and one can imagine resources moving from this VLAN into Amazon's cloud over time. (In fact, a multi-region delivery platform stretched across Amazon's international cloud, perhaps using AWS CloudFront to deliver especially high-volume objects is very compelling.)

The Semi-private VLAN hosts the bulk of ICPSR's current systems: desktop computers, printers, storage systems, and so on. This equipment needs robust outbound access to the Internet, but inbound access should be restricted closely. For example, there is no good reason why ICPSR's printers need to be accessible from any arbitrary host on the Internet.

The Private VLANs will host our most secure systems. This will include the Secure Data processing Environment (SDE) we're building, and also a future Virtual Data Enclave (VDE). Access between these VLANs and the Internet will be highly, highly restricted, and more general access will be permitted only from our Semi-private VLAN.

We've just started the process of building out this future network architecture with ITS, and expect that we'll finish the deployment and transition before the end of the summer.

ICPSR Web Server reboot today - 2pm EST - Feb 19. 2010

We've noticed an elevated level of transient errors with our production web server. This is likely causing a small performance problem for some web site visitors, and is also generating sporadic false-positive notifications from the University of Michigan's Network Operations Center.

We're rebooting the web server at 2pm EST today (Feb 19, 2010). We believe this will clear the transient errors, and improve performance. We've elected to perform this maintenance in the middle of the work day (USA time) so that the entire team is available to diagnose and treat any unexpected problems once the web server restarts. This includes systems such as download, search, the Summer Program portal, and so on.

We expect a brief outage lasting only a few minutes.

Our apologies to our web site visitors for the inconvenience this will cause.

Tuesday, February 16, 2010

IT Recharge Projects - 2010 Q1

The steering committee selected seven projects for attention in Q1 of 2010. Here's the list with a brief description of each one.

One, Restricted Use Contracting System. This is an on-going project to build a suite of tools to support the contracting process for access to restricted-use research data. There's a portal where researchers complete tasks such as uploading documents (research plan, IRB approval, CVs, etc), submitting security plans to protect the data, and enumerating who will have access to the data. There's also an internal administrative tool for managing contracts, and an automated workflow system that generates alerts and reminders.

Two, Turnover Notification to Depositors. This is a small project that closes the loop with a depositor when his/her data results in the release of an ICPSR study.

Three, Citation Data Feed. This is another small project that will feed our bibliography to partner organizations via an automated system.

Four, Expose Deposited Files via Deposit Viewer. This is a slightly larger project which updates our internal Deposit Viewer/Manager tool so that one may use it to access the payload of a deposit. Currently staff must use a separate mechanism to accomplish this.

Five, SAMHDA Site Redesign. Another medium-size project to retool the SAMHDA site to our new technology platform.

Six, Variable Metadata Management System Design. A preliminary project to scope how we might design and build a variable-level metadata management system to complement our study-level metadata management system.

Seven, Invoice Management. Our parent organization, the Institute for Social Research, is our de facto accounts receivable office, and this is a project to analyze the process by which information flows between Organization Representatives, ICPSR, and ISR.

Monday, February 15, 2010

ICPSR Web Services outage

The ICPSR web site and several of its special topic web sites were unavailable from 1:01pm EST until 1:20pm EST on Tuesday, February 9, 2010. This was an unscheduled outage.

The proximate reason for outage was that the tomcat web application server faulted during a routine deployment, and stopped answering queries. This caused the ICPSR web site and other "new platform" web sites written in Java/JSP to become unresponsive.

ICPSR IT staff detected the problem immediately after the deployment, and began troubleshooting. After a brief series of diagnostics, the problem was isolated to tomcat. After the team restarted tomcat, the problem cleared.

Our apologies for any inconvenience this may have caused.

Friday, February 12, 2010

TRAC: C3.1: Risk assessment

C3.1 Repository maintains a systematic analysis of such factors as data, systems, personnel, physical plant, and security needs.

Regular risk assessment should address external threats and denial of service attacks. These analyses are likely to be documented in several different places, and need not be comprehensively contained in a single document.

Evidence: ISO 17799 certification; documentation describing analysis and risk assessments undertaken and their outputs; logs from environmental recorders; confirmation of successful staff vetting.

ICPSR performs regular risk assessment using a variety of techniques.

One, on a purely technological level, ICPSR analyzes its security exposure through regular vulnerability scans that are conducted by the University of Michigan's Information Technology Security Services (ITSS) team. This includes participation in a mandatory, quarterly, campus-wide scan, and also an optional, monthly, detailed scan of our own network.

Two, ICPSR and its parent organization, the Institute for Social Research (ISR), undergo periodic Information Technology audits that are conducted by the University of Michigan's Office of University Audits. These are comprehensive assessments of physical and electronic security.

Three, ICPSR and its parent organization, ISR, conduct two risk evaluation projects each year to assess risks and vulnerabilities in IT infrastructure. The most recent one undertaken by ICPSR was in 2009, and the focus was on research data security.

Four, again, ICPSR along with ISR, participate in an annual review and refresh of an organization-wide IT security plan. This is a comprehensive evaluation of vulnerabilities, strengths, and resources related to security, and flows into a University of Michigan central repository of security plans, and is also reviewed by the leadership of ISR.

Five, ICPSR also participated in the Center for Research Libraries Auditing and Certification of Digital Archives Project "test audit" in 2005-2006. This audit reviewed a broad spectrum of functions and services at ICPSR (including IT), and produced a publicly available report. The report was very favorable to ICPSR:
Auditors found an organization with a mature, fully operational archive. ICPSR as an organization has a 44-year history of growing and managing a large (at the time of the audit 2.3 Terabytes) data archive of valuable content with a virtually unblemished record in data management and access. The future prospects of both the organization and the data appear favorable, due to a combination of sound financial management and planning, multiple sources of revenue, robust reporting and accountability mechanisms, and sound technical decisions related to processes, procedures, and formats.
In summary, ICPSR conducts risk assessment - particularly technological risk assessment - in a variety of ways on a routine basis.

Thursday, February 11, 2010

ICPSR's IT Recharge

ICPSR supports its IT infrastructure through the use of a recharge mechanism. Here's a blurb from an overview document I wrote:
ICPSR hosts an array of both government- and member-funded activities and projects. While each has its own clients, goals, and areas of specialization, they all share the same essential technology needs. To deliver efficient and effective data products and services to its constituency each project requires access to tools, systems, and technical resources that ingest, curate, preserve, and deliver research data. ICPSR has long recognized this shared need, and operates a Recharge Center (RC) to deliver these services.

The RC is operated by ICPSR's Computer and Network Services team, which is comprised of systems analysts, software developers, systems and network administrators, certified IT security professionals, and a management team. The RC is funded through an annually reviewed University of Michigan Recharge account. The RC operates via this recharge mechanism, allocating expenses equally regardless if the client is a federal grant or contract, a membership-funded project, or an activity funded through internal University of Michigan funds. The rate does not discriminate between federally and non-federally funded activities.
Recently ICPSR began a new process to allocate RC-funded resources that conduct business analysis and software development. Here's how it works:

During the course of normal business each standing committee at ICPSR (e.g., the Dissemination Committee, the [Research Data] Processing Workflow Committee, etc) records a list of technology projects it would like to see undertaken. The head of each committee works with someone from my team to scope and size each project. Those lists are prioritized, and then all of the committee heads meet together once per quarter to create a single, consolidated list of projects. We manage the size of the list such that the amount of work for the quarter is equal to the resources available in the quarter.

My team then commits to working each project in the upcoming quarter. Some of the projects are short-lived, and can be completed within the quarter; others are long-lived and may span quarters. In this latter case, only one part of a bigger project would be completed in a single quarter.

I thought I would dedicate one post per quarter (starting with this one) where I would list the set of projects, and describe each one briefly. In some cases this will provide a "sneak preview" of new products and services that will roll out in the months to come.

Monday, February 8, 2010

TRAC: C2.2: Appropriate software technologies

C2.2 Repository has software technologies appropriate to the services it provides to its designated community(ies) and has procedures in place to receive and monitor notifications, and evaluate when software technology changes are needed.

The repository needs to be aware of the types of access services expected by its designated community(ies), and to make sure its software capabilities can support these services. For example, it may need to add format translations to meet the needs of currently widely used application tools, or it may need to add a data subsetting service for very large data objects.

Evidence: Technology watch; documentation of procedures; designated community profiles; user needs evaluation; software inventory.

My sense is that ICPSR is very strong in this area, and is, in general, quite strong in areas that touch the Access portion of the OAIS framework.

Regardless of the format in which it receives content, and regardless of the internal format used to preserve the content, ICPSR routinely produces the "full product suite" of formats for delivery on the web site. At this time this includes platform-independent versions of SPSS, SAS, and Stata, as well as tab-separate values (for those that might want to use Excel or Access), and even plain ASCII data with setups.

ICPSR also prepares the content for use with our online analysis systems, SDA from UC Berkeley. This opens up access to the data for those users who lack statistical analysis software, and also allows one to subset the data prior to download.

Also, ICPSR spends a considerable amount of resources on data discovery and data browsing systems, such as our new faceted search, and the extensive metadata we prepare to document the data. These too might fall into the "appropriate software" portion of TRAC.

Does my sense of how we're doing match that of the communities we serve? How well do you think we're doing at providing the right types of access and appropriate formats?