Monday, March 19, 2012

A look back to 2000 (part 3)

Returning to the 2000 ICPSR Annual Report:
Operations Support

ICPSR continues to pursue a strategy of distributed and networked computing systems. ICPSR staff use increasingly powerful individual personal computers and workstations electronically networked to more powerful servers. This hierarchy of computing devices allows ICPSR to take advantage of the good price-performance ratios in desktop computers and still have the higher performance servers to provide the computing power and mass storage needed to handle the large volume of data processed and disseminated each year.
It is ironic, but even though ICPSR staff have much more powerful desktop machines today, they perform fewer work functions on those machines.  Instead, most of the data management and processing work takes place within the confines of the Secure Data Environment (SDE), and therefore most of the work takes place on a virtual desktop machine that we "rent" from the University of Michigan's central IT provider, ITS.
All staff members have cost-effective Pentium or Macintosh desktop workstations with connectivity to powerful, specialized servers. All staff members have access to a standard set of desktop applications (word processing, spreadsheets, local area network services, World Wide Web access, electronic mail) as well as to specialized software necessary to perform particular functions (statistical packages, desktop publication software, specialized editing packages, database management systems, etc.).
And today it is all managed centrally - machine images, software packages, operating system patches, etc.
ICPSR currently runs servers that provide high-capacity magnetic disk storage, magnetic tape access (l/2-inch reel-to-reel and 3480, 4mm, 8mm, I/4-inch cartridge, and digital linear tape), database management facilities, high-capacity printers, image and Optical Character Recognition (OCR) scanning, CD-ROM mastering, and wide area network gateways. All of these services are available on ICPSR's internal local area network, and are provided by a set of five SPARCstations.
Well, not so much anymore.  The only thing we manage that ever sees a tape is a special-purpose tape library that we use to back up our EMC NS 120 storage appliances.  And that's LTO-4 tape.

And the five SPARCstations have become five dozen different pizza box servers (when we need lots of local disk), blade servers, and virtual servers running in Amazon's EC2.
ICPSR's current main computer servers are a pair of SPARCserver 1000s running Solaris and connected via a dedicated 100 megabit-per-second subnet. CNS plans to upgrade this equipment next year, replacing the SPARCservers with more powerful Sun 4500 Enterprise Servers.
The E4500 systems - one for the web server and one for databases and data processing - were still reasonably new when I arrived in 2002.  But we replaced them with Dell servers running RH Linux in 2007 (I think), and then replaced those with bigger, better, badder 64-bit machines this year.
Over the more than 30-year history of ICPSR, the Computing and Network Services group has undertaken several major in-house programming projects to provide ICPSR with more effective data processing tools. FAST and CDNet are specialized archival processing and management systems that CNS developed. CNS continues to maintain and upgrade ICPSR's core orderinventory and record-keeping systems.
I think the gang had killed off CDNet even before I arrived, and we eliminated FAST a year or two into my tenure.  Everything is much more automated now, but the overall workflow and business process hasn't changed, and that is a problem.  The act of storing an object in the repository and the act of publishing an object on the web site are still very much intertwined, and changing the business process - and the software systems - is one of the key goals for FLAME.
ICPSR came through the Y2K "crisis" relatively unscathed. All of our systems were tested and upgraded where necessary during 1999, and were taken off-line as a precaution over the New Year's holiday. Coincidentally, a non-Y2K-related hardware failure occurred when the system was powered back up on New Year's Day, and the system stayed down until January 3rd.
And now we have a replica of ICPSR's web-based delivery system in Amazon's cloud.
In the coming year, CNS plans to add another terabyte of high-speed RAID disk storage arrays to accommodate the needs of the expanding archive and to provide space for migration from our library of 3480 tape cartridges. We have also continued an aggressive program of upgrading staff workstations.
I don't know how much disk storage ICPSR had in 2000, but we have 50+ TBs on-site in our EMC NAS units, another 10TB with ITS, another 6TB at Michigan State University, and at least this much again at DuraCloud and again and again in Amazon's EBS and S3 systems.

If the number of computing systems has grown 10x, I would estimate that the amount of managed storage has grown by 100x.

And that doesn't count the additional 50-100TB we need for the Measurements of Effective Teaching video collection that we have.

But not so many tapes....

And this week's photo contest entry.  (Mouse over the image to see the name.)

I think this one is too easy.

