A recent conversation with Nathan Adams, ICPSR's Assistant IT Director for Software Development got me thinking about this....
One, Anonymous Analysis : This is where we make a dataset available via SDA and there is no authentication allowed.
Two, Authenticated Analysis : One must authenticate using MyData, Google, or Facebook.
Three, Member Analysis : One must authenticate and also be using a computer located on the campus (even virtually) of a member institution.
Four, Private Analysis : One must authenticate and the identity used must be a member of a previously created group of identities.
Five through eight, Secure Analysis : Like any of the options above, but where the raw, proprietary, binary data files reside on a separate server, and where the ICPSR web server accesses the content via HTTPS rather than through the filesystem.
Nine through Sixteen, Non-disclosed Analysis : Like any of the eight options above, but where SDA's disclosure.txt controls have been used to attempt to prevent unintentional disclosure.
So sixteen different combinations! And it is easy to imagine even more cropping up in the months ahead.
My experience is that one ends up with sixteen different online analysis "products" when things grow organically over time. When things evolve due to a small tweaks in response to requests like, "Hey, could we use SDA for this, but with just one small change ..... ?"
It is easy to see how it happens. But when things grow over time like this, they end up suffering from a profound lack of design, and end up costing more to maintain. They are fragile. They break when you change things, like the hardware. Or the OS. Or the NAS. Or the authentication scheme. Or the oil in your car.
So probably time to pull back a bit, pull together a team of content owners, and start asking some questions.
If we were going to start fresh today with an on-line analysis system, what should we build?
What sort of access controls are needed to prevent bad guys from using it?
What sort of disclosure mitigation capabilities are required to prevent accidents from happening?
To which populations might we need to restrict access?
What does the user experience look like? Is this geared for the novice or for expert-in-a-hurry? Or do we have multiple audiences and so need to build more than one experience?
Time to design.