# of files # of deposits File format 112 30 application/msword 9 3 application/octet-stream 81 27 application/pdf 3 2 application/vnd.ms-excel 4 2 application/vnd.wordperfect 22 14 application/x-sas 83 21 application/x-spss 3 3 application/x-stata 2 2 application/x-zip 2 2 image/jpeg 1 1 image/x-3ds 1 1 message/rfc8220117bit 9 4 text/html 5 2 text/plain; charset=unknown 87 16 text/plain; charset=us-ascii 3 2 text/rtf 1 1 text/x-c; charset=unknown 7 1 text/x-c; charset=us-ascii
Still need to tweak the automated MIME type detector to stop reporting that it is finding C source code. The eight files above are most likely plain text files that just happen to have something like a pound-sign or "slash-star" sequence starting in the first column.
Not shown here - because it isn't passing through the deposit system - is a considerable volume of video content from the Gates Foundation. We have a bit over 6TB that we received in early 2012, and about 1TB of a 20TB collection that will arrive in a steady stream over the next 12-16 months.
If our policy is that the ICPSR deposit system is just one of many mechanisms for ICPSR to accept content, then this seems OK.
But, if we expect the deposit system to be the complete and correct record of ALL incoming content, then we do have a problem. A 7TB problem that is will grow up to be a big and strong 26TB problem at some point.