Tuesday, June 30, 2009

Fedora Land Speed Record?

I don't know if there is an existing "land speed record" for putting the number of Datastreams into a single object into Fedora, but we have ingested a data object with nearly 17k Datastreams into our Fedora repository. To be honest I was not sure that Fedora would be able to handle it, but it did.

This particular object is related to ICPSR Study 13517, and the Datastreams are largely Census 2000 data files that we pulled from the web site of the Census Bureau in 2003. Overall the Datastreams consume a bit over 4GB of disk space, and their corresponding objects (in FOXML format) use only about 10MB of space. They have very little metadata in the in-line DC and RELS-EXT Datastreams, for example.

We're still working on creating the "keepsake" objects I described in an earlier post, but if Fedora can handle this number of Datastreams, there shouldn't be any problems with other less massive studies.

One small complaint: The Fedora admin tool (fedora-admin.bat on the Windows platform) does not do a very good job with objects that have a large number of Datastreams. Because of the way it tiles the Datastreams and lacks scroll bars, it is pretty much useless for something of this scale. The "out of the box" web services, however, do a fine job displaying the object.

