| # of files | # of deposits | File format |
| 20 | 2 | F 0x07 video/h264 |
| 172 | 10 | application/msword |
| 7 | 4 | application/msword application/msword |
| 1704 | 4 | application/octet-stream |
| 143 | 31 | application/pdf |
| 4 | 3 | application/vnd.ms-excel |
| 4 | 3 | application/vnd.wordperfect |
| 33 | 1 | application/x-dosexec |
| 8 | 2 | application/x-empty |
| 35 | 6 | application/x-sas |
| 32 | 21 | application/x-spss |
| 28 | 3 | application/x-stata |
| 1 | 1 | application/x-zip |
| 2 | 1 | image/bmp |
| 1696 | 1 | image/jpeg |
| 4 | 1 | image/png |
| 7 | 7 | message/rfc8220117bit |
| 3 | 3 | text/html |
| 9 | 2 | text/plain; charset=iso-8859-1 |
| 142 | 17 | text/plain; charset=us-ascii |
| 2 | 1 | text/plain; charset=utf-8 |
| 6 | 5 | text/rtf |
| 1 | 1 | text/x-c; charset=us-ascii |
| 3 | 2 | text/x-mail; charset=unknown |
| 17 | 3 | text/xml |
Two items are noteworthy.
One is that we moved a few key systems from older 32-bit machines running older versions of RHEL to new 64-bit machines running RHEL 6. As it turns out the magic database that file uses on RHEL 6 is in a new format, and did not work well with our local additions (aka localmagic and localmagic.mime for Linux folks). So my belief is that our file-based format detector threw up its hands more often than usual, and this accounts for the over 1700 unknown (application/octet-stream) format types last month. I think these are good candidates for a follow-up scan to correct the results.
Two, lots of images. I know that we are getting a lot of video and images as part of our Bill and Melinda Gates Foundation MET and MET Extension projects, but I also know that none of the files above is from that project. So where is all of this coming from? One big deposit....
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.