Wednesday, September 30, 2009

TRAC: C1.1: Well-supported core infrastructure

C1.1 Repository functions on well-supported operating systems and other core
infrastructural software.

The requirement specifies “well-supported” as opposed to manufacturer-supported or other similar phrases. The level of support for these elements of the infrastructure must be appropriate to their uses; the repository must show that it understands where the risks lie. The degree of support required relates to the criticality of the subsystem involved. A repository may deliberately have an old system using out-of-date software to support some aspects of its ingest function. If this system fails, it may take some time to replace it, if it can be replaced at all. As long as its failure does not affect mission-critical functions, this is acceptable. Systems used for internal development may not be protected or supported to the same level as those for end-user service.

Evidence: Software inventory; system documentation; support contracts; use of strongly community supported software (i.e., Apache).



At the foundation of ICPSR's core technology infrastructure is Red Hat Linux. Linux, of course, is a very widely deployed open source variant of UNIX, and Red Hat is a world leader in supporting Linux. ICPSR had previously used proprietary operating systems, but moved all of its systems to Red Hat Linux in the decade.

Moving up in the stack of our core technology infrastructure we use several pieces of software from the Apache Software Foundation, a very large community of developers and users. In addition to the flagship HTTP Server, we also use the tomcat servlet container for all of our Java-based web applications, the Solr search engine which is built atop the Lucene Java search technology, and the cocoon framework for rendering XML into other formats.

ICPSR also builds and maintains a suite of custom software for key business processes, such as our Data Deposit Form (ingest), our download system (access), and our data processing systems (data management, ingest, access). While this software is necessarily proprietary, it is written in common, modern software languages such as Perl and Java, which have wide support in the community.

Like many, many enterprises, ICPSR uses Oracle as its database system. Given the large installed base of Oracle across the world, ICPSR views this as a well-supported platform. We believe we can continue to use Oracle as long as the University of Michigan continues to make it freely available to us. Further, we make use of only the most basic elements of Oracle, and if required, it would be straight-forward, but not insignificant, to migrate our content to any other relational database technology, such as postgres or mysql. Our one highly customized use of Oracle, and therefore the one that carries the most risk, is the OracleText-based search engine for the Child Care site, is scheduled to be replaced in early 2010.

And finally, we are actively migrating our own proprietary archival storage system to the Fedora system, which is supported by the newly created DuraSpace organization. Other than internally created systems, Fedora probably has the smallest community of support of any of our major technology systems, but because it is open source and gaining traction in the community, we believe its level of support will continue to grow over time. And further, because the underlying content resides in plain XML files, even with a sudden and catastrophic loss of Fedora, it would still be possible to migrate content to another system.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.