Provides resources to handle OpenXML documents from Python.
Project description
openxmllib
openxmllib is a set of tools that deals with the new ECMA 376 office file formats known as OpenXML.
http://www.ecma-international.org/publications/standards/Ecma-376.htm
OpenXML format is actually used by Microsoft Office 2007. Apple iWork’08 and OpenOffice 2.2 have filters to use this format too.
Features
Tested features
Extract words from a document for indexing purpose.
Get metadata from a document
Planned features
Transform a document to HTML
Public API
>>> import openxmllib >>> doc = openxmllib.openXmlDocument('office.docx') >>> # Raises a ValueError on not supported office files. >>> doc.mimeType 'application/vnd.openxmlformats-officedocument.wordprocessingml.document' >>> doc.coreProperties # Keys may depend on application {'title': u'blah...', u'creator': u'John Doe', ...} >>> doc.extendedProperties # Keys may depend on application {'Words': u'312', 'Application': u'Your favorite word processor', ...} >>> doc.customProperties # May return an empty mapping {'My property': u'My value', ...} >>> doc.allProperties # Merges core+extended+custom properties (see above) {...} >>> doc.indexableText(include_properties=False) u'all the words of that document body' >>> doc.indexableText(include_properties=True) u'all the words of that document body and all properties values'
Copying and License
Copyright (c) 2008 Gilles Lenfant
This software is subject to the provisions of the GNU General Public License, Version 2.0 (GPL). A copy of the GPL should accompany this distribution. THIS SOFTWARE IS PROVIDED “AS IS” AND ANY AND ALL EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE
More details in the COPYING file included in this package.
Status
This software is in alpha quality, has been tested only on Mac OSX with Python 2.4 and lxml 1.3.6.
It should work on other platforms, with Python 2.5, perhaps with other versions of lxml.
Requirements
lxml 1.3.6: get lxml with easy_install. e.g:
$ easy_install lxml==1.3.6
Warning: openxmllib is untested with the new lxml 2 (alpha state when writing this line). It may or may not work with this lxml 2, but please don’t report bugs found in such situation until lxml 2 officially required here.
Installation
$ python setup.py install
From now you can “import openxmllib” in your Python apps and use the “openxmlinfo.py” command line utility.
Gotchas
Be aware that most text data coming from the various openxmllib services might be us-ascii or Unicode. This is a side effect of lxml (bug or feature ?). It’s up to your application to convert these texts to the appropriate charset.
TODO: File this to lxml tracker or ML
We do not actually handle exceptions due to malformed XML or various unexpected structures. You should handle the various (potential) problems in a try (…) except (…) block in your application.
Testing
Note that testing does not require the installation:
$ cd tests $ python runalltests.py
Credits
Gilles Lenfant <gilles dot lenfant at gmail dot com>
Future features and bugfixes
Features
Support for standard mimetypes module
Add our mime types to standard Python module.
Support for URLs
>>> from openxmllib import openXmlDocument >>> doc = openXmlDocument('http://www.mydomain.com/mydoc.docx')
Human readable plain text conversion
>>> from openxmllib import openXmlDocument >>> doc = openXmlDocument(...) >>> doc.textDocument(target_directory)
(this may be not possible for spreadsheets)
HTML conversions
>>> from openxmllib import openXmlDocument >>> doc = openXmlDocument(...) >>> doc.htmlDocument(target_directory)
This requires to find open source XSLT stylesheets.
Document generation
FIXME: more to say here
Packaging
Installation
Turn this into an egg (“easy_install openxmllib”).
Documentation
Add epydoc generated API documentation in doc/api.
Utility
Install “openxmlinfo.py” on Windows.
Bugfixes
…Waiting for feedback ;o)
History
1.0.3
Conforming XPath constructor signature. [gilles.lenfant_AT_gmail_DOT_com]
New test files built with Mac Office 2008 [gilles.lenfant_AT_gmail_DOT_com]
1.0.2
Fix bad “egging”. [kev_AT_coolcavemen_DOT_com]
1.0.1
Egg-ification. [kev_AT_coolcavemen_DOT_com]
1.0.0
First public version. [gilles.lenfant_AT_gmail_DOT_com]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.