Skip to main content

Convert HTML to markdown.

Project description

GitHub Workflow Status Pypi version License Pypi Downloads

Installation

pip install markdownify

Usage

Convert some HTML to Markdown:

from markdownify import markdownify as md
md('<b>Yay</b> <a href="http://github.com">GitHub</a>')  # > '**Yay** [GitHub](http://github.com)'

Specify tags to exclude:

from markdownify import markdownify as md
md('<b>Yay</b> <a href="http://github.com">GitHub</a>', strip=['a'])  # > '**Yay** GitHub'

...or specify the tags you want to include:

from markdownify import markdownify as md
md('<b>Yay</b> <a href="http://github.com">GitHub</a>', convert=['b'])  # > '**Yay** GitHub'

Options

Markdownify supports the following options:

strip

A list of tags to strip. This option can’t be used with the convert option.

convert

A list of tags to convert. This option can’t be used with the strip option.

autolinks

A boolean indicating whether the “automatic link” style should be used when a a tag’s contents match its href. Defaults to True.

default_title

A boolean to enable setting the title of a link to its href, if no title is given. Defaults to False.

heading_style

Defines how headings should be converted. Accepted values are ATX, ATX_CLOSED, SETEXT, and UNDERLINED (which is an alias for SETEXT). Defaults to UNDERLINED.

bullets

An iterable (string, list, or tuple) of bullet styles to be used. If the iterable only contains one item, it will be used regardless of how deeply lists are nested. Otherwise, the bullet will alternate based on nesting level. Defaults to '*+-'.

strong_em_symbol

In markdown, both * and _ are used to encode strong or emphasized texts. Either of these symbols can be chosen by the options ASTERISK (default) or UNDERSCORE respectively.

sub_symbol, sup_symbol

Define the chars that surround <sub> and <sup> text. Defaults to an empty string, because this is non-standard behavior. Could be something like ~ and ^ to result in ~sub~ and ^sup^.

newline_style

Defines the style of marking linebreaks (<br>) in markdown. The default value SPACES of this option will adopt the usual two spaces and a newline, while BACKSLASH will convert a linebreak to \\n (a backslash an a newline). While the latter convention is non-standard, it is commonly preferred and supported by a lot of interpreters.

code_language

Defines the language that should be assumed for all <pre> sections. Useful, if all code on a page is in the same programming language and should be annotated with ```python or similar. Defaults to '' (empty string) and can be any string.

escape_underscores

If set to False, do not escape _ to \_ in text. Defaults to True.

Options may be specified as kwargs to the markdownify function, or as a nested Options class in MarkdownConverter subclasses.

Converting BeautifulSoup objects

from markdownify import MarkdownConverter

# Create shorthand method for conversion
def md(soup, **options):
    return ImageBlockConverter(**options).convert_soup(soup)

Creating Custom Converters

If you have a special usecase that calls for a special conversion, you can always inherit from MarkdownConverter and override the method you want to change:

from markdownify import MarkdownConverter

class ImageBlockConverter(MarkdownConverter):
    """
    Create a custom MarkdownConverter that adds two newlines after an image
    """
    def convert_img(self, el, text, convert_as_inline):
        return super().convert_img(el, text, convert_as_inline) + '\n\n'

# Create shorthand method for conversion
def md(html, **options):
    return ImageBlockConverter(**options).convert(html)

Development

To run tests:

python setup.py test

To lint:

python setup.py lint

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markdownify-0.10.3.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

markdownify-0.10.3-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file markdownify-0.10.3.tar.gz.

File metadata

  • Download URL: markdownify-0.10.3.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for markdownify-0.10.3.tar.gz
Algorithm Hash digest
SHA256 782e310390cd5e4bde7543ceb644598c78b9824ee9f8d7ef9f9f4f8782e46974
MD5 3e16893569f456a99acbcd22e0cebc75
BLAKE2b-256 f799ec03beda4e8781c1a2f88f9f7e9ccc7120fad5d7aa9ef12b36c12711d0dd

See more details on using hashes here.

Provenance

File details

Details for the file markdownify-0.10.3-py3-none-any.whl.

File metadata

  • Download URL: markdownify-0.10.3-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for markdownify-0.10.3-py3-none-any.whl
Algorithm Hash digest
SHA256 edad0ad3896ec7460d05537ad804bbb3614877c6cd0df27b56dee218236d9ce2
MD5 95796e5659d718c53b6ffe4a377d8112
BLAKE2b-256 1872959654faf57adf6fc62c8a2eead39df2a2edb102d702e0ce9facffa089eb

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page