Skip to main content

MWParserFromHell is a parser for MediaWiki wikicode.

Project description

Build Status Coverage Status

mwparserfromhell (the MediaWiki Parser from Hell) is a Python package that provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode. It supports Python 2 and Python 3.

Developed by Earwig with contributions from Σ, Legoktm, and others. Full documentation is available on ReadTheDocs. Development occurs on GitHub.

Installation

The easiest way to install the parser is through the Python Package Index; you can install the latest release with pip install mwparserfromhell (get pip). Make sure your pip is up-to-date first, especially on Windows.

Alternatively, get the latest development version:

git clone https://github.com/earwig/mwparserfromhell.git
cd mwparserfromhell
python setup.py install

You can run the comprehensive unit testing suite with python setup.py test -q.

Usage

Normal usage is rather straightforward (where text is page text):

>>> import mwparserfromhell
>>> wikicode = mwparserfromhell.parse(text)

wikicode is a mwparserfromhell.Wikicode object, which acts like an ordinary str object (or unicode in Python 2) with some extra methods. For example:

>>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
>>> wikicode = mwparserfromhell.parse(text)
>>> print(wikicode)
I has a template! {{foo|bar|baz|eggs=spam}} See it?
>>> templates = wikicode.filter_templates()
>>> print(templates)
['{{foo|bar|baz|eggs=spam}}']
>>> template = templates[0]
>>> print(template.name)
foo
>>> print(template.params)
['bar', 'baz', 'eggs=spam']
>>> print(template.get(1).value)
bar
>>> print(template.get("eggs").value)
spam

Since nodes can contain other nodes, getting nested templates is trivial:

>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
>>> mwparserfromhell.parse(text).filter_templates()
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']

You can also pass recursive=False to filter_templates() and explore templates manually. This is possible because nodes can contain additional Wikicode objects:

>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
>>> print(code.filter_templates(recursive=False))
['{{foo|this {{includes a|template}}}}']
>>> foo = code.filter_templates(recursive=False)[0]
>>> print(foo.get(1).value)
this {{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0])
{{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0].get(1).value)
template

Templates can be easily modified to add, remove, or alter params. Wikicode objects can be treated like lists, with append(), insert(), remove(), replace(), and more. They also have a matches() method for comparing page or template names, which takes care of capitalization and whitespace:

>>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
>>> code = mwparserfromhell.parse(text)
>>> for template in code.filter_templates():
...     if template.name.matches("Cleanup") and not template.has("date"):
...         template.add("date", "July 2012")
...
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
>>> code.replace("{{uncategorized}}", "{{bar-stub}}")
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> print(code.filter_templates())
['{{cleanup|date=July 2012}}', '{{bar-stub}}']

You can then convert code back into a regular str object (for saving the page!) by calling str() on it:

>>> text = str(code)
>>> print(text)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> text == code
True

Likewise, use unicode(code) in Python 2.

Limitations

While the MediaWiki parser generates HTML and has access to the contents of templates, among other things, mwparserfromhell acts as a direct interface to the source code only. This has several implications:

  • Syntax elements produced by a template transclusion cannot be detected. For example, imagine a hypothetical page "Template:End-bold" that contained the text </b>. While MediaWiki would correctly understand that <b>foobar{{end-bold}} translates to <b>foobar</b>, mwparserfromhell has no way of examining the contents of {{end-bold}}. Instead, it would treat the bold tag as unfinished, possibly extending further down the page.

  • Templates adjacent to external links, as in http://example.com{{foo}}, are considered part of the link. In reality, this would depend on the contents of the template.

  • When different syntax elements cross over each other, as in {{echo|''Hello}}, world!'', the parser gets confused because this cannot be represented by an ordinary syntax tree. Instead, the parser will treat the first syntax construct as plain text. In this case, only the italic tag would be properly parsed.

    Workaround: Since this commonly occurs with text formatting and text formatting is often not of interest to users, you may pass skip_style_tags=True to mwparserfromhell.parse(). This treats '' and ''' as plain text.

    A future version of mwparserfromhell may include multiple parsing modes to get around this restriction more sensibly.

Additionally, the parser lacks awareness of certain wiki-specific settings:

  • Word-ending links are not supported, since the linktrail rules are language-specific.

  • Localized namespace names aren’t recognized, so file links (such as [[File:...]]) are treated as regular wikilinks.

  • Anything that looks like an XML tag is treated as a tag, even if it is not a recognized tag name, since the list of valid tags depends on loaded MediaWiki extensions.

Integration

mwparserfromhell is used by and originally developed for EarwigBot; Page objects have a parse method that essentially calls mwparserfromhell.parse() on page.get().

If you’re using Pywikibot, your code might look like this:

import mwparserfromhell
import pywikibot

def parse(title):
    site = pywikibot.Site()
    page = pywikibot.Page(site, title)
    text = page.get()
    return mwparserfromhell.parse(text)

If you’re not using a library, you can parse any page using the following Python 3 code (via the API):

import json
from urllib.parse import urlencode
from urllib.request import urlopen
import mwparserfromhell
API_URL = "https://en.wikipedia.org/w/api.php"

def parse(title):
    data = {"action": "query", "prop": "revisions", "rvlimit": 1,
            "rvprop": "content", "format": "json", "titles": title}
    raw = urlopen(API_URL, urlencode(data).encode()).read()
    res = json.loads(raw)
    text = list(res["query"]["pages"].values())[0]["revisions"][0]["*"]
    return mwparserfromhell.parse(text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mwparserfromhell-0.5.2.tar.gz (132.1 kB view details)

Uploaded Source

Built Distributions

mwparserfromhell-0.5.2-cp37-cp37m-win_amd64.whl (97.4 kB view details)

Uploaded CPython 3.7m Windows x86-64

mwparserfromhell-0.5.2-cp37-cp37m-win32.whl (93.7 kB view details)

Uploaded CPython 3.7m Windows x86

mwparserfromhell-0.5.2-cp36-cp36m-win_amd64.whl (97.4 kB view details)

Uploaded CPython 3.6m Windows x86-64

mwparserfromhell-0.5.2-cp36-cp36m-win32.whl (93.7 kB view details)

Uploaded CPython 3.6m Windows x86

mwparserfromhell-0.5.2-cp35-cp35m-win_amd64.whl (97.4 kB view details)

Uploaded CPython 3.5m Windows x86-64

mwparserfromhell-0.5.2-cp35-cp35m-win32.whl (93.7 kB view details)

Uploaded CPython 3.5m Windows x86

mwparserfromhell-0.5.2-cp34-cp34m-win_amd64.whl (93.4 kB view details)

Uploaded CPython 3.4m Windows x86-64

mwparserfromhell-0.5.2-cp34-cp34m-win32.whl (91.4 kB view details)

Uploaded CPython 3.4m Windows x86

mwparserfromhell-0.5.2-cp27-cp27m-win_amd64.whl (93.1 kB view details)

Uploaded CPython 2.7m Windows x86-64

mwparserfromhell-0.5.2-cp27-cp27m-win32.whl (90.8 kB view details)

Uploaded CPython 2.7m Windows x86

File details

Details for the file mwparserfromhell-0.5.2.tar.gz.

File metadata

  • Download URL: mwparserfromhell-0.5.2.tar.gz
  • Upload date:
  • Size: 132.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.7.0

File hashes

Hashes for mwparserfromhell-0.5.2.tar.gz
Algorithm Hash digest
SHA256 f74cc77045b7e1a2e899c46374e7176294fca44ccb53b32bf44600a4b2c192ec
MD5 1169e0476301a5abfb948a93c8239a0d
BLAKE2b-256 bb8713c195cab36757e50a7fdb22f7bd8dfa8b8d14b62b0d2fe78d4c272e10ed

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.2-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.2-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 97.4 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.0

File hashes

Hashes for mwparserfromhell-0.5.2-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 0e90de3831f495135896729be34a07bd402b6b3c2a3e4d6c88996d5fa83eda67
MD5 7636814cd76f0e81d7cb68cb0397e6c1
BLAKE2b-256 8dfeb21714d89928bdfdb17784ebe399c7d65c37f8959f812bff517e8ee4b073

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.2-cp37-cp37m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.2-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 93.7 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.0

File hashes

Hashes for mwparserfromhell-0.5.2-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 469b6f0eddf549853147c9b4330c28832ac24afb9c8478ad13bc538bd3c42753
MD5 e40fad30dafaacc8d8f86296e1f4ecbd
BLAKE2b-256 5726f5a3df20c3baadbfa3b52fca5ffbcb916fad84756082dfc01429aee47dcb

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.2-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.2-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 97.4 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for mwparserfromhell-0.5.2-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 59a012594365e0402f557ab0d8c2a17e2b12f7974c4325a4087712131abe236f
MD5 b3dfb1a5e1b91f342e6d3e556d07bdc8
BLAKE2b-256 9202490c0aa3db697ab50ab6689bf62e56cf92db4f8c55e5e1a6c8e14e3cc212

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.2-cp36-cp36m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.2-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 93.7 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for mwparserfromhell-0.5.2-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 fb86686f6bb2c6305a03599eaadb878df529557b62f56f6d72b3dd5f150231ab
MD5 1eca1f3c1838ab63c7c73e149e0f4399
BLAKE2b-256 4ccdd206fbab091a11c2de917c42c5377da62cd3a9038702d0dc576b804f1636

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.2-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.2-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 97.4 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/28.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.4

File hashes

Hashes for mwparserfromhell-0.5.2-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 0abc68fe60a38e68f3e155f86b4aad7af77cf74bfc7968eb6e73c07774530d1f
MD5 839fea68d936bf82ccd0118283c48fb4
BLAKE2b-256 7959a7ff071886f653db634b79a42d2d4a5843a96e2ad2746c89bf5ac8c76d3d

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.2-cp35-cp35m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.2-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 93.7 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/28.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.4

File hashes

Hashes for mwparserfromhell-0.5.2-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 90a78c4df40f2d5f67c9f84300c08bf0418b2fee19e577f244f82b93ebf7e673
MD5 bf5fc6bc790660d77e2f6fac35f0f347
BLAKE2b-256 b7dadb8c4100d73035f417c92df55cc91775ab7335343551a209aedf25e4f8c8

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.2-cp34-cp34m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.2-cp34-cp34m-win_amd64.whl
  • Upload date:
  • Size: 93.4 kB
  • Tags: CPython 3.4m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/18.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.4.4

File hashes

Hashes for mwparserfromhell-0.5.2-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 f589ad44fbbfb2a4d4540e6046fc090d9a9d8cc128f62a49137cd6ee02de51d8
MD5 e541e9099ff8029665f4c7ba0fa844c8
BLAKE2b-256 663806fa89bda491c21f77a00bf2b1b1e94fbae50afb68a4a9c545b8eb08f60b

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.2-cp34-cp34m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.2-cp34-cp34m-win32.whl
  • Upload date:
  • Size: 91.4 kB
  • Tags: CPython 3.4m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/18.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.4.4

File hashes

Hashes for mwparserfromhell-0.5.2-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 c5da96c0f8a71dd922a3bf2567936bc848f38809d7edc1eb7759d1e2f9ef9046
MD5 3ccef7e6d9d372d371fdffdc42fe3b33
BLAKE2b-256 e7bb38d08e0000c706ebfbcd6cefee144e7ae3fc70d1249950c0844e56bf2136

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.2-cp27-cp27m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.2-cp27-cp27m-win_amd64.whl
  • Upload date:
  • Size: 93.1 kB
  • Tags: CPython 2.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.15

File hashes

Hashes for mwparserfromhell-0.5.2-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 e8f70d800b7d2a608bd56fa3c2621184a471432ea45c0a9a56c0b48987004469
MD5 f63484e664d9089c130bcc7da89b296b
BLAKE2b-256 710fd1362ddcacd04fac6cf211157b8006da04c0f8589d5073c12a1a2c40939f

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.2-cp27-cp27m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.2-cp27-cp27m-win32.whl
  • Upload date:
  • Size: 90.8 kB
  • Tags: CPython 2.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.15

File hashes

Hashes for mwparserfromhell-0.5.2-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 94757060d8434f4899ce90290ee2fb8a828552af899eb70d7d289fa188bb7c46
MD5 f267c1642287b9c7f4368616f7791146
BLAKE2b-256 74fa78cfb3c7f2281b96d3e0b92329e808f1e7eaa532fab35c96565ca1aab33b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page