Turn HTML into equivalent Markdown-structured text.
Project description
# [html2text](http://www.aaronsw.com/2002/html2text/)
[![Build Status](https://secure.travis-ci.org/Alir3z4/html2text.png)](http://travis-ci.org/Alir3z4/html2text)
[![Coverage Status](https://coveralls.io/repos/Alir3z4/html2text/badge.png)](https://coveralls.io/r/Alir3z4/html2text)
[![Downloads](https://pypip.in/d/html2text/badge.png)](https://pypi-hypernode.com/pypi/html2text/)
[![Version](https://pypip.in/v/html2text/badge.png)](https://pypi-hypernode.com/pypi/html2text/)
[![Egg?](https://pypip.in/egg/html2text/badge.png)](https://pypi-hypernode.com/pypi/html2text/)
[![Wheel?](https://pypip.in/wheel/html2text/badge.png)](https://pypi-hypernode.com/pypi/html2text/)
[![Format](https://pypip.in/format/html2text/badge.png)](https://pypi-hypernode.com/pypi/html2text/)
[![License](https://pypip.in/license/html2text/badge.png)](https://pypi-hypernode.com/pypi/html2text/)
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
Usage: `html2text.py [(filename|url) [encoding]]`
| Option | Description
|--------------------------------------------------------|--------------------------------------------------
| `--version` | Show program's version number and exit
| `-h`, `--help` | Show this help message and exit
| `--ignore-links` | Don't include any formatting for links
|`--ignore-images` | Don't include any formatting for images
|`-g`, `--google-doc` | Convert an html-exported Google Document
|`-d`, `--dash-unordered-list` | Use a dash rather than a star for unordered list items
|`-b` `BODY_WIDTH`, `--body-width`=`BODY_WIDTH` | Number of characters per output line, `0` for no wrap
|`-i` `LIST_INDENT`, `--google-list-indent`=`LIST_INDENT`| Number of pixels Google indents nested lists
|`-s`, `--hide-strikethrough` | Hide strike-through text. only relevent when `-g` is specified as well
|`--escape-all` | Escape all special characters. Output is less readable, but avoids corner case formatting issues.
Or you can use it from within `Python`:
import html2text
print html2text.html2text("<p>Hello, world.</p>")
Or with some configuration options:
import html2text
h = html2text.HTML2Text()
h.ignore_links = True
print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")
_Originally written by Aaron Swartz. This code is distributed under the GPLv3._
## How to install
`html2text` is available on pypi
https://pypi-hypernode.com/pypi/html2text
```
$ pip install html2text
```
## How to run unit tests
PYTHONPATH=$PYTHONPATH:. coverage run --source=html2text setup.py test -v
[![Build Status](https://secure.travis-ci.org/Alir3z4/html2text.png)](http://travis-ci.org/Alir3z4/html2text)
[![Coverage Status](https://coveralls.io/repos/Alir3z4/html2text/badge.png)](https://coveralls.io/r/Alir3z4/html2text)
[![Downloads](https://pypip.in/d/html2text/badge.png)](https://pypi-hypernode.com/pypi/html2text/)
[![Version](https://pypip.in/v/html2text/badge.png)](https://pypi-hypernode.com/pypi/html2text/)
[![Egg?](https://pypip.in/egg/html2text/badge.png)](https://pypi-hypernode.com/pypi/html2text/)
[![Wheel?](https://pypip.in/wheel/html2text/badge.png)](https://pypi-hypernode.com/pypi/html2text/)
[![Format](https://pypip.in/format/html2text/badge.png)](https://pypi-hypernode.com/pypi/html2text/)
[![License](https://pypip.in/license/html2text/badge.png)](https://pypi-hypernode.com/pypi/html2text/)
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
Usage: `html2text.py [(filename|url) [encoding]]`
| Option | Description
|--------------------------------------------------------|--------------------------------------------------
| `--version` | Show program's version number and exit
| `-h`, `--help` | Show this help message and exit
| `--ignore-links` | Don't include any formatting for links
|`--ignore-images` | Don't include any formatting for images
|`-g`, `--google-doc` | Convert an html-exported Google Document
|`-d`, `--dash-unordered-list` | Use a dash rather than a star for unordered list items
|`-b` `BODY_WIDTH`, `--body-width`=`BODY_WIDTH` | Number of characters per output line, `0` for no wrap
|`-i` `LIST_INDENT`, `--google-list-indent`=`LIST_INDENT`| Number of pixels Google indents nested lists
|`-s`, `--hide-strikethrough` | Hide strike-through text. only relevent when `-g` is specified as well
|`--escape-all` | Escape all special characters. Output is less readable, but avoids corner case formatting issues.
Or you can use it from within `Python`:
import html2text
print html2text.html2text("<p>Hello, world.</p>")
Or with some configuration options:
import html2text
h = html2text.HTML2Text()
h.ignore_links = True
print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")
_Originally written by Aaron Swartz. This code is distributed under the GPLv3._
## How to install
`html2text` is available on pypi
https://pypi-hypernode.com/pypi/html2text
```
$ pip install html2text
```
## How to run unit tests
PYTHONPATH=$PYTHONPATH:. coverage run --source=html2text setup.py test -v
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
html2text-2014.9.7.tar.gz
(23.7 kB
view details)
File details
Details for the file html2text-2014.9.7.tar.gz
.
File metadata
- Download URL: html2text-2014.9.7.tar.gz
- Upload date:
- Size: 23.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a51454c5afcf9356ad639123f0254d668f672f2397c98c1ddd1f231267e158e3 |
|
MD5 | dd66cba0218c3340c492fe3289a23fed |
|
BLAKE2b-256 | f1745bd33f27fa0e211c46d4d2e9874efbf042d5fe3b78d97346129c08a09124 |