# of files | # of deposits | File format |
20 | 2 | F 0x07 video/h264 |
172 | 10 | application/msword |
7 | 4 | application/msword application/msword |
1704 | 4 | application/octet-stream |
143 | 31 | application/pdf |
4 | 3 | application/vnd.ms-excel |
4 | 3 | application/vnd.wordperfect |
33 | 1 | application/x-dosexec |
8 | 2 | application/x-empty |
35 | 6 | application/x-sas |
32 | 21 | application/x-spss |
28 | 3 | application/x-stata |
1 | 1 | application/x-zip |
2 | 1 | image/bmp |
1696 | 1 | image/jpeg |
4 | 1 | image/png |
7 | 7 | message/rfc8220117bit |
3 | 3 | text/html |
9 | 2 | text/plain; charset=iso-8859-1 |
142 | 17 | text/plain; charset=us-ascii |
2 | 1 | text/plain; charset=utf-8 |
6 | 5 | text/rtf |
1 | 1 | text/x-c; charset=us-ascii |
3 | 2 | text/x-mail; charset=unknown |
17 | 3 | text/xml |
Two items are noteworthy.
One is that we moved a few key systems from older 32-bit machines running older versions of RHEL to new 64-bit machines running RHEL 6. As it turns out the magic database that file uses on RHEL 6 is in a new format, and did not work well with our local additions (aka localmagic and localmagic.mime for Linux folks). So my belief is that our file-based format detector threw up its hands more often than usual, and this accounts for the over 1700 unknown (application/octet-stream) format types last month. I think these are good candidates for a follow-up scan to correct the results.
Two, lots of images. I know that we are getting a lot of video and images as part of our Bill and Melinda Gates Foundation MET and MET Extension projects, but I also know that none of the files above is from that project. So where is all of this coming from? One big deposit....
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.