Google+ Followers

Friday, November 12, 2010

TRAC: B2.5: AIP Naming Conventions

B2.5 Repository has and uses a naming convention that generates visible, persistent, unique identifiers for all archived objects (i.e., AIPs).

A repository needs to ensure that an accepted, standard naming convention is in place that identifies its materials uniquely and persistently for use both in and outside the repository. The “visibility” requirement here means “visible” to repository managers and auditors. It does not imply that these unique identifiers need to be visible to end users or that they serve as the primary means of access to digital objects.

Equally important is a system of reliable linking/resolution services in order to find the uniquely named object, no matter its physical location. This is so that actions relating to AIPs can be traced over time, over system changes, and over storage changes. Ideally, the unique ID lives as long as the AIP; if it does not, there must be traceability. The ID system must be seen to fit the repository’s current and foreseeable future requirements for things like numbers of objects. It must be possible to demonstrate that the identifiers are unique. Note that B2.1 requires that the components of an AIP be suitably bound and identified for long-term management, but places no restrictions on how AIPs are identified with files. Thus, in the general case, an AIP may be distributed over many files, or a single file may contain more than one AIP. Therefore identifiers and filenames may not necessarily correspond to each other.

Documentation must show how the persistent identifiers of the AIP and its components are assigned and maintained so as to be unique within the context of the repository. The documentation must also describe any processes used for changes to such identifiers. It must be possible to obtain a complete list of all such identifiers and do spot checks for duplications.

Evidence: Documentation describing naming convention and physical evidence of its application (e.g., logs).



ICPSR generates a unique ID for each file that we receive via a deposit, and a unique ID for each post-processed file that we create.  Files are stored in well-defined locations in archival storage, and between the location and filename (which also follows a set of standard conventions within ICPSR), one can identify considerable provenance information which is also replicated in a database.

Specifically, each file that has been deposited at ICPSR - if retained for preservation - has a unique ID stored in a database, and has a unique location in archival storage:  deposits/depositID/originalFilenameSanitized.  The root of the location varies depending upon the physical location of the copy in archival storage.  For example, a copy stored locally at ICPSR may have a URI root of file://nas.icpsr.umich.edu/archival-storage while a copy stored in an off-site archival location will have a different URI root.

Likewise for content that is produced by ICPSR staff - if retained for preservation - has a similar unique ID and unique location.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.