Skip to main content

MWParserFromHell is a parser for MediaWiki wikicode.

Project description

Build Status Coverage Status

mwparserfromhell (the MediaWiki Parser from Hell) is a Python package that provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode. It supports Python 2 and Python 3.

Developed by Earwig with contributions from Σ, Legoktm, and others. Full documentation is available on ReadTheDocs. Development occurs on GitHub.

Installation

The easiest way to install the parser is through the Python Package Index; you can install the latest release with pip install mwparserfromhell (get pip). Make sure your pip is up-to-date first, especially on Windows.

Alternatively, get the latest development version:

git clone https://github.com/earwig/mwparserfromhell.git
cd mwparserfromhell
python setup.py install

You can run the comprehensive unit testing suite with python setup.py test -q.

Usage

Normal usage is rather straightforward (where text is page text):

>>> import mwparserfromhell
>>> wikicode = mwparserfromhell.parse(text)

wikicode is a mwparserfromhell.Wikicode object, which acts like an ordinary str object (or unicode in Python 2) with some extra methods. For example:

>>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
>>> wikicode = mwparserfromhell.parse(text)
>>> print(wikicode)
I has a template! {{foo|bar|baz|eggs=spam}} See it?
>>> templates = wikicode.filter_templates()
>>> print(templates)
['{{foo|bar|baz|eggs=spam}}']
>>> template = templates[0]
>>> print(template.name)
foo
>>> print(template.params)
['bar', 'baz', 'eggs=spam']
>>> print(template.get(1).value)
bar
>>> print(template.get("eggs").value)
spam

Since nodes can contain other nodes, getting nested templates is trivial:

>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
>>> mwparserfromhell.parse(text).filter_templates()
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']

You can also pass recursive=False to filter_templates() and explore templates manually. This is possible because nodes can contain additional Wikicode objects:

>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
>>> print(code.filter_templates(recursive=False))
['{{foo|this {{includes a|template}}}}']
>>> foo = code.filter_templates(recursive=False)[0]
>>> print(foo.get(1).value)
this {{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0])
{{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0].get(1).value)
template

Templates can be easily modified to add, remove, or alter params. Wikicode objects can be treated like lists, with append(), insert(), remove(), replace(), and more. They also have a matches() method for comparing page or template names, which takes care of capitalization and whitespace:

>>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
>>> code = mwparserfromhell.parse(text)
>>> for template in code.filter_templates():
...     if template.name.matches("Cleanup") and not template.has("date"):
...         template.add("date", "July 2012")
...
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
>>> code.replace("{{uncategorized}}", "{{bar-stub}}")
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> print(code.filter_templates())
['{{cleanup|date=July 2012}}', '{{bar-stub}}']

You can then convert code back into a regular str object (for saving the page!) by calling str() on it:

>>> text = str(code)
>>> print(text)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> text == code
True

Likewise, use unicode(code) in Python 2.

Limitations

While the MediaWiki parser generates HTML and has access to the contents of templates, among other things, mwparserfromhell acts as a direct interface to the source code only. This has several implications:

  • Syntax elements produced by a template transclusion cannot be detected. For example, imagine a hypothetical page "Template:End-bold" that contained the text </b>. While MediaWiki would correctly understand that <b>foobar{{end-bold}} translates to <b>foobar</b>, mwparserfromhell has no way of examining the contents of {{end-bold}}. Instead, it would treat the bold tag as unfinished, possibly extending further down the page.

  • Templates adjacent to external links, as in http://example.com{{foo}}, are considered part of the link. In reality, this would depend on the contents of the template.

  • When different syntax elements cross over each other, as in {{echo|''Hello}}, world!'', the parser gets confused because this cannot be represented by an ordinary syntax tree. Instead, the parser will treat the first syntax construct as plain text. In this case, only the italic tag would be properly parsed.

    Workaround: Since this commonly occurs with text formatting and text formatting is often not of interest to users, you may pass skip_style_tags=True to mwparserfromhell.parse(). This treats '' and ''' as plain text.

    A future version of mwparserfromhell may include multiple parsing modes to get around this restriction more sensibly.

Additionally, the parser lacks awareness of certain wiki-specific settings:

  • Word-ending links are not supported, since the linktrail rules are language-specific.

  • Localized namespace names aren’t recognized, so file links (such as [[File:...]]) are treated as regular wikilinks.

  • Anything that looks like an XML tag is treated as a tag, even if it is not a recognized tag name, since the list of valid tags depends on loaded MediaWiki extensions.

Integration

mwparserfromhell is used by and originally developed for EarwigBot; Page objects have a parse method that essentially calls mwparserfromhell.parse() on page.get().

If you’re using Pywikibot, your code might look like this:

import mwparserfromhell
import pywikibot

def parse(title):
    site = pywikibot.Site()
    page = pywikibot.Page(site, title)
    text = page.get()
    return mwparserfromhell.parse(text)

If you’re not using a library, you can parse any page using the following Python 3 code (via the API):

import json
from urllib.parse import urlencode
from urllib.request import urlopen
import mwparserfromhell
API_URL = "https://en.wikipedia.org/w/api.php"

def parse(title):
    data = {"action": "query", "prop": "revisions", "rvprop": "content",
            "rvslots": "main", "rvlimit": 1, "titles": title,
            "format": "json", "formatversion": "2"}
    raw = urlopen(API_URL, urlencode(data).encode()).read()
    res = json.loads(raw)
    revision = res["query"]["pages"][0]["revisions"][0]
    text = revision["slots"]["main"]["content"]
    return mwparserfromhell.parse(text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mwparserfromhell-0.5.4.tar.gz (135.5 kB view details)

Uploaded Source

Built Distributions

mwparserfromhell-0.5.4-cp37-cp37m-win_amd64.whl (97.6 kB view details)

Uploaded CPython 3.7m Windows x86-64

mwparserfromhell-0.5.4-cp37-cp37m-win32.whl (93.8 kB view details)

Uploaded CPython 3.7m Windows x86

mwparserfromhell-0.5.4-cp36-cp36m-win_amd64.whl (97.6 kB view details)

Uploaded CPython 3.6m Windows x86-64

mwparserfromhell-0.5.4-cp36-cp36m-win32.whl (93.8 kB view details)

Uploaded CPython 3.6m Windows x86

mwparserfromhell-0.5.4-cp35-cp35m-win_amd64.whl (97.6 kB view details)

Uploaded CPython 3.5m Windows x86-64

mwparserfromhell-0.5.4-cp35-cp35m-win32.whl (93.8 kB view details)

Uploaded CPython 3.5m Windows x86

mwparserfromhell-0.5.4-cp34-cp34m-win_amd64.whl (93.6 kB view details)

Uploaded CPython 3.4m Windows x86-64

mwparserfromhell-0.5.4-cp34-cp34m-win32.whl (91.6 kB view details)

Uploaded CPython 3.4m Windows x86

mwparserfromhell-0.5.4-cp27-cp27m-win_amd64.whl (93.3 kB view details)

Uploaded CPython 2.7m Windows x86-64

mwparserfromhell-0.5.4-cp27-cp27m-win32.whl (91.0 kB view details)

Uploaded CPython 2.7m Windows x86

File details

Details for the file mwparserfromhell-0.5.4.tar.gz.

File metadata

  • Download URL: mwparserfromhell-0.5.4.tar.gz
  • Upload date:
  • Size: 135.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for mwparserfromhell-0.5.4.tar.gz
Algorithm Hash digest
SHA256 aaf5416ab9b75e99e286f8a4216f77a2f7d834afd4c8f81731e701e59bf99305
MD5 fab96774908146e2b7b6d89e885f5bf2
BLAKE2b-256 23034fb04da533c7e237c0104151c028d8bff856293d34e51d208c529696fb79

See more details on using hashes here.

Provenance

File details

Details for the file mwparserfromhell-0.5.4-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.4-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 97.6 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for mwparserfromhell-0.5.4-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 2df9222bc5fc8f82ad68a3dd06ee38bde47eff521c4315b32523938ed834c3a4
MD5 d6cc997367218ed87641dd84241b6d3a
BLAKE2b-256 3d595616626f4b4122e60b8011a2f5f4a6c188669ede91defa1557b1d5c2f4da

See more details on using hashes here.

Provenance

File details

Details for the file mwparserfromhell-0.5.4-cp37-cp37m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.4-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 93.8 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for mwparserfromhell-0.5.4-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 6377696e0fe0e12bdf9b94eeb2625edd3f30e373c72c593986c85eba60184575
MD5 acac15d6b85ddcdaafbb92dd4a0e49ae
BLAKE2b-256 32065ad3d16ceb86614a7f00d5babf19834b92002ad66ebd01adeef22ae6aa4a

See more details on using hashes here.

Provenance

File details

Details for the file mwparserfromhell-0.5.4-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.4-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 97.6 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.8

File hashes

Hashes for mwparserfromhell-0.5.4-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 696381fb96682a37655a8dfc35393d60433622e65618fd9e003cc81cb6e10554
MD5 c56b6870c77f24b89ec88230ddbc6da4
BLAKE2b-256 dc006f57b7bf94a0f4c78e76b084d3c1e88e9496600b80a7f1e22445c2b08abb

See more details on using hashes here.

Provenance

File details

Details for the file mwparserfromhell-0.5.4-cp36-cp36m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.4-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 93.8 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.8

File hashes

Hashes for mwparserfromhell-0.5.4-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 049b4de541f0e03c0d9b745e9482fcd72637661ac8af7f924a7005bd46a4e551
MD5 cd3c025d32c97a547f0b25fd39336718
BLAKE2b-256 96e8e6b4d98edb335e6f6e7044e0a1c06e2a33b74e85c55e1dec0df896ca0727

See more details on using hashes here.

Provenance

File details

Details for the file mwparserfromhell-0.5.4-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.4-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 97.6 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/28.8.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.5.4

File hashes

Hashes for mwparserfromhell-0.5.4-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 301e7742c769be6a2dedc287be2ca6724b5e85ed6aadc788fc2ba0a17d11f509
MD5 8d25ff0da5c2432006ca9a7dfb961f5c
BLAKE2b-256 a212f37e9d8812778350442fbf3b0a9bb586b7652dacab753c82085fe7c6180e

See more details on using hashes here.

Provenance

File details

Details for the file mwparserfromhell-0.5.4-cp35-cp35m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.4-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 93.8 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/28.8.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.5.4

File hashes

Hashes for mwparserfromhell-0.5.4-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 e1419c058cd4f644cc58af9ea13304d415d6d282a7b64beb5226ee3c25eb2cbf
MD5 e9b034cc75e89d42f3d58e4d3840c43d
BLAKE2b-256 1b4dcacdd263e28bd02fa0a7dd96c255da715c8f116723bce132f57c6cbd0ef1

See more details on using hashes here.

Provenance

File details

Details for the file mwparserfromhell-0.5.4-cp34-cp34m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.4-cp34-cp34m-win_amd64.whl
  • Upload date:
  • Size: 93.6 kB
  • Tags: CPython 3.4m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/18.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.4.4

File hashes

Hashes for mwparserfromhell-0.5.4-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 78cf0686ea762ff06ede4d52af05abc9981fba881d8d6f1325d6b34fe3e5d295
MD5 42baea9dd62d19401ffb32de11d345d3
BLAKE2b-256 cca5ff1959eb47d34f7555ec11050d25bbf891a564a41744dab3ce6ca6bb4bf3

See more details on using hashes here.

Provenance

File details

Details for the file mwparserfromhell-0.5.4-cp34-cp34m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.4-cp34-cp34m-win32.whl
  • Upload date:
  • Size: 91.6 kB
  • Tags: CPython 3.4m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/18.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.4.4

File hashes

Hashes for mwparserfromhell-0.5.4-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 dda8bc7807b5b77acebc5ea0699f0c95ca9a8d8151a36a2121d6affcfcdfcf87
MD5 1f7825eb4e4837c643e324690ee5972e
BLAKE2b-256 8f2837b297ae5614fb38c568a27762976e1877711f9cc76f837252b47bb5dc76

See more details on using hashes here.

Provenance

File details

Details for the file mwparserfromhell-0.5.4-cp27-cp27m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.4-cp27-cp27m-win_amd64.whl
  • Upload date:
  • Size: 93.3 kB
  • Tags: CPython 2.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.16

File hashes

Hashes for mwparserfromhell-0.5.4-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 b101d457343c948588de5c292267b273de7632227a47e564e5ab80a3b10c4df3
MD5 e1ccbb3ba871ff1db45365c839136be1
BLAKE2b-256 21097c361b506d94a98348c3be55a891d7f6899d6afafa2ae088ee44a6315560

See more details on using hashes here.

Provenance

File details

Details for the file mwparserfromhell-0.5.4-cp27-cp27m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.4-cp27-cp27m-win32.whl
  • Upload date:
  • Size: 91.0 kB
  • Tags: CPython 2.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.16

File hashes

Hashes for mwparserfromhell-0.5.4-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 fb7000b444d11f850992dc7cb31c1d89fa544a28ab7f983a8aca3af29af5ea00
MD5 aa17722004fd2e5108b93844beeb73ad
BLAKE2b-256 3d80f08cba8748b292fcc8dd20599f36be7ba31d73132765b201259f9f975903

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page