A Python interface to XSL-FO libraries (Conversion HTML to PDF, RTF, DOCX, WML and ODT)
Project description
The zopyx.convert package helps you to convert HTML to PDF, RTF, ODT, DOCX and WML using XSL-FO technology.
Requirements
Java 1.5.0 or higher (FOP 0.94 requires Java 1.6 or higher)
csstoxslfo (included)
XFC-4.0 (XMLMind) for ODT, RTF, DOCX and WML support
XINC 2.0 (Lunasil) for PDF support (commercial)
or FOP 0.94 (Apache project) for PDF support (free)
Installation
install zopyx.convert either using easy_install or by downloading the sources from the Python Cheeseshop
the environment variable $XFC_DIR must be set and point to the root of your XFC installation directory
the environment variable $XINC_HOME must be set and to point to the root of your XINC installation directory
the environment variable $FOP_HOME must be set and point to the root of your FOP installation directory
Supported platforms
Windows, Unix
Subversion repository
Usage
Some examples from the Python command-line:
from zopyx.convert import Converter C = Convert('/path/to/some/file.html') pdf_filename = C('pdf') # using XINC pdf2_filename = C('pdf2') # using FOP rtf_filename = C('rtf') pdt_filename = C('odt') wml_filename = C('wml') docx_filename = C('docx')
A very simple command-line converter is also available:
xslfo-convert --format rtf --output foo.rtf sample.html
xslfo-convert has a –test option that will convert some sample HTML. If everything is ok then you should see something like that:
>xslfo-convert --test Entering testmode pdf: /tmp/tmpuOb37m.html -> /tmp/tmpuOb37m.pdf rtf: /tmp/tmpuOb37m.html -> /tmp/tmpuOb37m.rtf docx: /tmp/tmpuOb37m.html -> /tmp/tmpuOb37m.docx odt: /tmp/tmpuOb37m.html -> /tmp/tmpuOb37m.odt wml: /tmp/tmpuOb37m.html -> /tmp/tmpuOb37m.wml pdf: /tmp/tmpZ6PGo9.html -> /tmp/tmpZ6PGo9.pdf rtf: /tmp/tmpZ6PGo9.html -> /tmp/tmpZ6PGo9.rtf docx: /tmp/tmpZ6PGo9.html -> /tmp/tmpZ6PGo9.docx odt: /tmp/tmpZ6PGo9.html -> /tmp/tmpZ6PGo9.odt wml: /tmp/tmpZ6PGo9.html -> /tmp/tmpZ6PGo9.wml
How zopyx.convert works internally
The source HTML file is converted to XHTML using mxTidy
the XHTML file is converted to FO using the great “csstoxslfo” converter written by Werner Donne.
the FO file is passed either to the external XINC or XFC converter to generated the desired output format
all converters are based on Java technology make the conversion solution highly portable across operating system (including Windows)
Known issues
If you are using zopyx.convert together with FOP: use the latest FOP 0.94 only. Don’t use any packaged FOP version like the one from MacPorts which is known to be broken.
License
zopyx.convert is published under the Lesser GNU Public License V 2.1 (LGPL 2.1). See LICENSE.txt.
Contact
Changes:
1.1.3 (31.01.2008)
clearified Java requirements for FOP
1.1.2 (22.01.2008)
removed some nasty debugging code
1.1.1 (22.01.2008)
supporting FOP on Windows
1.1.0 (20.01.2008)
support for free FOP PDF converter
1.0.6 (14.10.2007)
html2fo: added workaround for generated FO code for PRE tags
1.0.5 (05.10.2007)
minor bugfixes
1.0.4 (05.10.2007)
Windows support added
1.0.3 (04.10.2007)
passing -Duser.language=en to java in order to prevent corrupted FO code caused by locales
1.0.2 (03.10.2007)
bugfix
1.0.1 (03.10.2007)
added –test option to command-line frontend
1.0.0 (30.09.2007)
update to css2xslfo V 1.5.0
official 1.0.0 release
0.5.0 (09.09.2007)
replaced mxTidy related code with the BeautifulSoup module (no longer requires any compiling)
html2fo checks the existence of images
0.4.9 (25.07.2007)
support for utidy lib (which is the preferred tidy library). Using mx.Tidy only as fallback
0.4.8 (unreleased)
unreleased
0.4.7 (08.07.2007)
reSTified documentation
0.4.6 (08.07.2007)
fixes in availableFormats()
0.4.5 (07.07.2007)
various FO fixes
0.4.4 (06.07.2007)
using logging module
0.4.3 (05.07.2007)
html2fo: using ElementTree for most FO modifications
0.4.2 (30.06.2007)
converting page-break-after: always back into break-after: page
0.4.1 (24.06.2007)
various fixes
0.4.0 (24.06.2007)
added zope interfaces
converters are now classes
added unittests
0.3.1 (18.06.2007)
html2fo() and the converter constructor got a new ‘encoding’ parameter in order to specify the input encoding of the HTML file. This parameter will be passed down to Tidy in order to perform a proper conversion of non-ascii characters.
0.3.0 (unreleased)
using subprocess module of Python
new Convert() class for high-level XSLFO access
logger added
better checks for XINC, XFC
updated documentation
0.2.0 (16.06.2007)
PDF support added
command line interface added
mxTidy integration
0.1.0 (16.06.2007)
initial release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.