Monday, April 9, 2012

March deposits at ICPSR

Chart?  Chart.
# of files# of depositsFile format
11230application/msword
93application/octet-stream
8127application/pdf
32application/vnd.ms-excel
42application/vnd.wordperfect
2214application/x-sas
8321application/x-spss
33application/x-stata
22application/x-zip
22image/jpeg
11image/x-3ds
11message/rfc8220117bit
94text/html
52text/plain; charset=unknown
8716text/plain; charset=us-ascii
32text/rtf
11text/x-c; charset=unknown
71text/x-c; charset=us-ascii

A blissfully normal month of deposits.  Usual types.  Usual volumes.

Still need to tweak the automated MIME type detector to stop reporting that it is finding C source code.  The eight files above are most likely plain text files that just happen to have something like a pound-sign or "slash-star" sequence starting in the first column.

Not shown here - because it isn't passing through the deposit system - is a considerable volume of video content from the Gates Foundation.  We have a bit over 6TB that we received in early 2012, and about 1TB of a 20TB collection that will arrive in a steady stream over the next 12-16 months.

If our policy is that the ICPSR deposit system is just one of many mechanisms for ICPSR to accept content, then this seems OK.

But, if we expect the deposit system to be the complete and correct record of ALL incoming content, then we do have a problem.  A 7TB problem that is will grow up to be a big and strong 26TB problem at some point.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.