Wednesday, October 7, 2009

TRAC: C1.2: Backup infrastructure

C1.2 Repository ensures that it has adequate hardware and software support for backup functionality sufficient for the repository’s services and for the data held, e.g., metadata associated with access controls, repository main content.

The repository needs to be able to demonstrate the adequacy of the processes, hardware and software for its backup systems. Some will need much more elaborate backup plans than others.

Evidence: Documentation of what is being backed up and how often; audit log/inventory of backups; Trustworthy Repositories Audit & Certification: Criteria and Checklist validation of completed backups; disaster recovery plan—policy and documentation; “firedrills”—testing of backups; support contracts for hardware and software for backup mechanisms.

ICPSR has extensive documentation and infrastructure to support its core access functions even when a catastrophic failure disables its primary location in Ann Arbor, Michigan. The documentation - planning documents and instructions - reside in a Google Group, and all members of the IT team, and two of ICPSR's senior staff outside of IT are members of the group. The process has been used twice in 2009, once as a test, and once when the Ann Arbor site suffered a power failure.

ICPSR has a less well documented, but fairly prosaic, backup solution in place. All non-ephemeral content at ICPSR resides on a large Network Attached Storage (NAS) appliance. The IT team has configured the NAS to "checkpoint" each filesystem once per day, and each checkpoint is retained for 30 days. Checkpoints provide a read-only, self-serve backup system for those instances where a member of the staff has inadvertently damaged or destroyed non-archival content.

Further, we write all filesystems to a tape library, which is located in a different machine room than the NAS. Every two weeks tapes are removed from the tape library, and stored in yet a different building. We retain the last four weekly backups, and the last twelve monthly backups. The system is exercised on an infrequent, but regular basis when we restore files that were damaged or destroyed beyond the thirty day checkpoint window.

Finally, unlike "working" files where all copies reside locally, and where we retain only one year of content, our archival storage solution consists of copies in at least four locations. The master copy (1) is on the NAS; a copy (2) is written to tape each month; a copy (3) is synchronized daily with the San Diego Supercomputer Center's storage grid; and, a copy (4) is synchronized daily with the MATRIX Center at Michigan State University. Furthermore, archival content collected prior to 2009 has also been copied into the Chronopolis project storage grid, which adds two additional copies.

One area with room for improvement would be regular "fire drills" where we would attempt to retrieve a random number of random objects from an arbitrarily selected archival storage location.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.