Monday, July 11, 2011

June 2011 deposits at ICPSR

June 2011 was a very busy time for our deposit system.  The number of deposits was pretty typical, but the number of files was enormous.

# of files# of depositsFile format
21application/msaccess
231application/msoffice
16523application/msword
6984application/octet-stream
26628application/pdf
14411application/vnd.ms-excel
11application/vnd.ms-powerpoint
142application/vnd.wordperfect
1411application/x-123
41application/x-arc011lzw
251application/x-dbase
231application/x-dosexec
11application/x-empty
11application/x-rar
195application/x-sas
130721application/x-spss
104application/x-stata
33application/x-zip
209message/rfc8220117bit
87text/html
126text/plain; charset=iso-8859-1
106text/plain; charset=unknown
438646text/plain; charset=us-ascii
21text/plain; charset=utf-8
114text/rtf
72text/x-c++; charset=us-ascii
11text/x-c; charset=us-ascii
11text/x-mail; charset=us-ascii
11text/xml
1532video/unknown

In addition to the usual suspects like plain ASCII, SAS, SPSS, MS Word, PDF, we also have some of the usual problems, such as files being reported by the automated checker as containing C or C++ source code, when the truth is that they are likely text/plain instead.

One interesting data point is the pair of deposits that contain video files, and lots of them.  Upon further review these appear to be vintage SPSS files for the IBM PC.  Here's a string that appears in all of the files:

SPSS/PC+ System File Written by Data Entry II

and here is another one:

PCSPSS SYSTEM FILE.  IBM PC DOS, SPSS/PC+ V3.0

From a timestamp located nearby, it looks like these files were from 1994.  Or maybe they were moved from a mainframe to a PC in 1994?


And there are a few others on the list above that would benefit from some human scrutiny as well.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.