Wednesday, February 8, 2012

January 2012 deposits at ICPSR

January looks like it was a very busy month at ICPSR:

# of files# of depositsFile format
202F 0x07 video/h264
17210application/msword
74application/msword application/msword
17044application/octet-stream
14331application/pdf
43application/vnd.ms-excel
43application/vnd.wordperfect
331application/x-dosexec
82application/x-empty
356application/x-sas
3221application/x-spss
283application/x-stata
11application/x-zip
21image/bmp
16961image/jpeg
41image/png
77message/rfc8220117bit
33text/html
92text/plain; charset=iso-8859-1
14217text/plain; charset=us-ascii
21text/plain; charset=utf-8
65text/rtf
11text/x-c; charset=us-ascii
32text/x-mail; charset=unknown
173text/xml

Two items are noteworthy.

One is that we moved a few key systems from older 32-bit machines running older versions of RHEL to new 64-bit machines running RHEL 6.  As it turns out the magic database that file uses on RHEL 6 is in a new format, and did not work well with our local additions (aka localmagic and localmagic.mime for Linux folks).  So my belief is that our file-based format detector threw up its hands more often than usual, and this accounts for the over 1700 unknown (application/octet-stream) format types last month.  I think these are good candidates for a follow-up scan to correct the results.

Two, lots of images.  I know that we are getting a lot of video and images as part of our Bill and Melinda Gates Foundation MET and MET Extension projects, but I also know that none of the files above is from that project.  So where is all of this coming from?  One big deposit....

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.