Technology at ICPSR: TRAC: B6.3: Ensuring proper access

B6.3 Repository ensures that agreements applicable to access conditions are adhered to.

The repository must be able to show what producer/depositor agreements apply to which AIPs and must validate user identities in order to ensure that the agreements are satisfied. Although it is easy to focus on denying access when considering conditions of this kind (that is, preventing unauthorized people from seeing material), it is just as important to show that access is granted when the conditions say it should be.

Access conditions are often just about who is allowed to see things, but they can be more complex. They may involve limits on quantities—all members of a certain community are permitted to access 10 items a year without charge, for instance. Or they may involve limits on usage or type of access—some items may be viewed but not saved for later reuse, or items may only be used for private research but not commercial gain, for instance.

Various scenarios may help illustrate what is required:

If a repository’s material is all open access, the repository can simply demonstrate that access is truly available to everyone.

If all material in the repository is available to a single, closed community, the repository must demonstrate that it validates that users are members of this community, perhaps by requesting some proof of identity before registering them, or just by restricting access by network addresses if the community can identified in that manner. It should also demonstrate that all members of the community can indeed gain access if they wish.

If different access conditions apply to different AIPs, the repository must demonstrate how these are realized.

If access conditions require users to make some declaration before receiving DIPs, the repository must show that the declarations have been made. These might be signed forms, or evidence that a statement has been viewed online and a button clicked to signify agreement. The declarations might involve nondisclosure or agreement to no commercial use, for instance.

Evidence: Access policies; logs of user access and user denials; access system mechanisms that prevent unauthorized actions (such as save, print, etc.); user compliance agreements.

Demonstrating that group X has been granted access (correctly) to resource set Y, and that group Z has been denied access (correctly) to the same resource set is either very easy or nearly impossible at ICPSR. Here's what I mean....

Most of our content is public-use and available to the entire world. Access requires only a very weak identify (MyData, or, these days, Facebook or Google IDs) and that the user click through a type of license (our terms of use). Software ensures that the person has authenticated and clicked through our license, and as long as the person performs these two steps, access is granted.

The next biggest collection of content is also public-use, but has one or two simple strings attached. In some cases, the data provider requires that access be anonymous, and so we skip the authentication step. In other cases, the content should be available only to users connected (somehow) to a member institution, and are rules for deciding if someone has such a connection are intentionally liberal. Are you using a computer with an IP address we think belongs to the member? Have you used such a computer within the past six months? Are you the Organizational Representative of a member institution, regardless of your IP address or use within the past six months? Any of these will get one to member-only content.

A small batch of content is restricted-use, and this too is easy. We send the content on removable media once a data use agreement (or contract) has been signed, and so ensuring that the content is going to only the right people is very straight-forward because the number of recipients is very small (i.e., one).

So that's the "very easy" part of the story.

However, there is almost always a very small collection which has very "interesting" access rules. These rules are usually short-lived, complex, and difficult to prove correct. It sometimes depends upon point solutions that need to made "right now."

As one example, I remember a case where we needed to make a certain ICPSR study available to our membership (easy), and to anyone running a browser on a machine with an IP address which was on a special list. This list contained dozens, maybe hundreds, of IP network numbers. Now, an easy mechanism would have been to treat those IP networks as address space belonging to a member institution, but then that would have granted ALL of our member-only content rather than just this one study. So we very quickly built new capabilities into the delivery system so that content could not only be "public" or "member-only" but also "member-only + these guys too". I don't think we needed to use the capability for more than a few months, and, of course, it is very hard to know if it did exactly what it was supposed to. (The cost of error was pretty low, unlike, say, errors made by a bank. Or a nuclear reaction.)

And there is almost always some similar need in production or on the radar screen, and so I consider this collection of ad hoc, short-lived access "solutions" to be the "nearly impossible" part of the story.

Technology at ICPSR

Friday, July 22, 2011

TRAC: B6.3: Ensuring proper access

No comments:

Post a Comment