This package provides an utility for guessing MIME type from file name and/or
actual contents. It’s based on freedesktop.org’s shared-mime-info database.
The shared-mime-info
is a extensible database of common MIME types. It provides powerful MIME type
detection mechanism as well as multi-lingual type descriptions.
This package requires shared-mime-info to be installed and accessible. The
easiest way to do that is to install it system-wide, for example installing
the shared-mime-info package on Ubuntu. The specification also describes
other ways to install and extend the database.
Note, that this package is currently not thread-safe, because data are meant to
be loaded only once, on module import. If there will be any problems because of
that, it could be changed in future.
The easiest way to use this package is to import the getType function from
the root module:
>>> from z3c.sharedmimeinfo import getType
This function tries to guess the MIME type as specified in shared-mime-info
specification document and always returns some usable MIME type, using
application/octet-stream or text/plain as fallback. It can detect MIME type by
file name, its contents or both, so it accepts two arguments: filename (string)
and/or file (file-like object). At least one of them should be given.
As said above, it needs at least one argument, so you can’t call it with no
arguments:
>>> getType()
Traceback (most recent call last):
...
TypeError: Either filename or file should be provided or both of them
Passing file name is done via the filename argument:
Passing file contents is done via file argument, which accepts a file-like
object. Let’s use our testing helper function to open a sample file and try
to guess a type for it:
If the MIME type cannot be detected, either text/plain or
application/octet-stream will be returned. The function will try to guess
is it text or binary by checking the first 32 bytes:
Objects returned by getType and other functions (see below) are actually
an extended unicode string objects, providing additional info about the MIME
type. They provide the IMIMEType interface:
>>> from zope.interface.verify import verifyObject
>>> from z3c.sharedmimeinfo.interfaces import IMIMEType
>>> mt = getType(filename='document.doc')
>>> verifyObject(IMIMEType, mt)
True
As they are actually unicode objects, they can be compared like strings:
>>> mt == 'application/msword'
True
They also provides the media and subtype attributes:
Let’s check the i18n features that comes with shared-mime-info and are
supported by this package. As seen above, the MIME type title message ID is
actually its <media>/<subtype>, but if we translate it, we’ll get a
human-friendly string:
>>> from zope.i18n import translate
>>> translate(mt.title)
u'Word document'
>>> translate(mt.title, target_language='ru')
u'\u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442 Word'
>>> from z3c.sharedmimeinfo.mimetype import MIMEType
We can also create IMIMEType objects by hand, using the MIMEType class:
>>> from z3c.sharedmimeinfo.mimetype import MIMEType
We can create them specifying media and subtype as two arguments or as a single
argument in the “media/subtype” form:
The getType function, described above is actually a method of the
IMIMETypesUtility object. The IMIMETypesUtility is a core component for
guessing MIME types.
Let’s import the utility directly and play with it:
>>> from z3c.sharedmimeinfo.utility import mimeTypesUtility
>>> from z3c.sharedmimeinfo.interfaces import IMIMETypesUtility
>>> verifyObject(IMIMETypesUtility, mimeTypesUtility)
True
It has three methods for getting mime type. Those three methods are
getType (described above), getTypeByFileName, getTypeByContents.
The getTypeByContents method accepts a file-like object and two optional
arguments: min_priority and max_priority that can be used to specify the range
of “magic” rules to be used. By default, min_priority is 0 and max_priority is
100, so all rules will be in use. See shared-mime-info specification for
details.
We have some sample files that should be detected by contents: