Do not try this street magic at home. |
Managing the magic database is not for the faint of heart. (Try updating the Vorbis section.) And this management has gotten both harder -- RHEL 6 uses a new format for its magic database which is incompatible with RHEL 5 -- and easier -- the new format eliminates the pesky magic.mime database. However, we've gotten reasonably competent at managing magic and have come to rely on it for file format identification.
In support of the FLAME project we even created a little web service that takes a file's content and its name as input, and delivers a little snippet of XML as the output. The XML contains the "human readable" answer from our magic database and the "MIME type" too. This is our first FLAME-inspired web service.
If you'd like to try it, you can use your favorite form-capable URL transfer utility to do so. Here's an example where I have run curl on one of our RHEL machines:
dhcp-bryan:; curl -F "file=@uuid-comparison.xlsx;filename=uuid-comparison.xlsx" www.icpsr.umich.edu/cgi-bin/wsifile<?xml version="1.0" encoding="utf-8"?><wsifile><ifile>Microsoft Excel</ifile><ifilemime>application/zip; charset=binary</ifilemime><uploadInfo>application/octet-stream</uploadInfo></wsifile>
feeding in an Excel file as the input, and another with a plain text file:
dhcp-bryan:; curl -F "file=@/etc/resolv.conf;filename=resolv.conf" www.icpsr.umich.edu/cgi-bin/wsifile<?xml version="1.0" encoding="utf-8"?><wsifile><ifile>ASCII text</ifile><ifilemime>text/plain; charset=us-ascii</ifilemime><uploadInfo>application/octet-stream</uploadInfo></wsifile>
and an interesting MS Word file:
dhcp-bryan:; curl -F "file=@2011-03CouncilPandAminutes.doc;filename=2011-03CouncilPandAminutes.doc" www.icpsr.umich.edu/cgi-bin/wsifile<?xml version="1.0" encoding="utf-8"?><wsifile><ifile>CDF V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1200, Number of Characters: 0, Name of Creating Application: Aspose.Words for Java 4.0.3.0, Number of Pages: 1, Revision Number: 1, Security: 0, Template: Normal.dot, Number of Words: 0</ifile><ifilemime>application/msword; charset=binary</ifilemime><uploadInfo>application/octet-stream</uploadInfo></wsifile>
Feel free to try it out, and to post reactions, suggestions here.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.