Skip to main content

MWParserFromHell is a parser for MediaWiki wikicode.

Project description

Build Status Coverage Status

mwparserfromhell (the MediaWiki Parser from Hell) is a Python package that provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode. It supports Python 2 and Python 3.

Developed by Earwig with contributions from Σ, Legoktm, and others. Full documentation is available on ReadTheDocs. Development occurs on GitHub.

Installation

The easiest way to install the parser is through the Python Package Index; you can install the latest release with pip install mwparserfromhell (get pip). Make sure your pip is up-to-date first, especially on Windows.

Alternatively, get the latest development version:

git clone https://github.com/earwig/mwparserfromhell.git
cd mwparserfromhell
python setup.py install

You can run the comprehensive unit testing suite with python setup.py test -q.

Usage

Normal usage is rather straightforward (where text is page text):

>>> import mwparserfromhell
>>> wikicode = mwparserfromhell.parse(text)

wikicode is a mwparserfromhell.Wikicode object, which acts like an ordinary str object (or unicode in Python 2) with some extra methods. For example:

>>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
>>> wikicode = mwparserfromhell.parse(text)
>>> print(wikicode)
I has a template! {{foo|bar|baz|eggs=spam}} See it?
>>> templates = wikicode.filter_templates()
>>> print(templates)
['{{foo|bar|baz|eggs=spam}}']
>>> template = templates[0]
>>> print(template.name)
foo
>>> print(template.params)
['bar', 'baz', 'eggs=spam']
>>> print(template.get(1).value)
bar
>>> print(template.get("eggs").value)
spam

Since nodes can contain other nodes, getting nested templates is trivial:

>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
>>> mwparserfromhell.parse(text).filter_templates()
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']

You can also pass recursive=False to filter_templates() and explore templates manually. This is possible because nodes can contain additional Wikicode objects:

>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
>>> print(code.filter_templates(recursive=False))
['{{foo|this {{includes a|template}}}}']
>>> foo = code.filter_templates(recursive=False)[0]
>>> print(foo.get(1).value)
this {{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0])
{{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0].get(1).value)
template

Templates can be easily modified to add, remove, or alter params. Wikicode objects can be treated like lists, with append(), insert(), remove(), replace(), and more. They also have a matches() method for comparing page or template names, which takes care of capitalization and whitespace:

>>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
>>> code = mwparserfromhell.parse(text)
>>> for template in code.filter_templates():
...     if template.name.matches("Cleanup") and not template.has("date"):
...         template.add("date", "July 2012")
...
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
>>> code.replace("{{uncategorized}}", "{{bar-stub}}")
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> print(code.filter_templates())
['{{cleanup|date=July 2012}}', '{{bar-stub}}']

You can then convert code back into a regular str object (for saving the page!) by calling str() on it:

>>> text = str(code)
>>> print(text)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> text == code
True

Likewise, use unicode(code) in Python 2.

Limitations

While the MediaWiki parser generates HTML and has access to the contents of templates, among other things, mwparserfromhell acts as a direct interface to the source code only. This has several implications:

  • Syntax elements produced by a template transclusion cannot be detected. For example, imagine a hypothetical page "Template:End-bold" that contained the text </b>. While MediaWiki would correctly understand that <b>foobar{{end-bold}} translates to <b>foobar</b>, mwparserfromhell has no way of examining the contents of {{end-bold}}. Instead, it would treat the bold tag as unfinished, possibly extending further down the page.

  • Templates adjacent to external links, as in http://example.com{{foo}}, are considered part of the link. In reality, this would depend on the contents of the template.

  • When different syntax elements cross over each other, as in {{echo|''Hello}}, world!'', the parser gets confused because this cannot be represented by an ordinary syntax tree. Instead, the parser will treat the first syntax construct as plain text. In this case, only the italic tag would be properly parsed.

    Workaround: Since this commonly occurs with text formatting and text formatting is often not of interest to users, you may pass skip_style_tags=True to mwparserfromhell.parse(). This treats '' and ''' as plain text.

    A future version of mwparserfromhell may include multiple parsing modes to get around this restriction more sensibly.

Additionally, the parser lacks awareness of certain wiki-specific settings:

  • Word-ending links are not supported, since the linktrail rules are language-specific.

  • Localized namespace names aren’t recognized, so file links (such as [[File:...]]) are treated as regular wikilinks.

  • Anything that looks like an XML tag is treated as a tag, even if it is not a recognized tag name, since the list of valid tags depends on loaded MediaWiki extensions.

Integration

mwparserfromhell is used by and originally developed for EarwigBot; Page objects have a parse method that essentially calls mwparserfromhell.parse() on page.get().

If you’re using Pywikibot, your code might look like this:

import mwparserfromhell
import pywikibot

def parse(title):
    site = pywikibot.Site()
    page = pywikibot.Page(site, title)
    text = page.get()
    return mwparserfromhell.parse(text)

If you’re not using a library, you can parse any page using the following Python 3 code (via the API):

import json
from urllib.parse import urlencode
from urllib.request import urlopen
import mwparserfromhell
API_URL = "https://en.wikipedia.org/w/api.php"

def parse(title):
    data = {"action": "query", "prop": "revisions", "rvprop": "content",
            "rvslots": "main", "rvlimit": 1, "titles": title,
            "format": "json", "formatversion": "2"}
    raw = urlopen(API_URL, urlencode(data).encode()).read()
    res = json.loads(raw)
    revision = res["query"]["pages"][0]["revisions"][0]
    text = revision["slots"]["main"]["content"]
    return mwparserfromhell.parse(text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mwparserfromhell-0.5.3.tar.gz (132.6 kB view details)

Uploaded Source

Built Distributions

mwparserfromhell-0.5.3-cp37-cp37m-win_amd64.whl (97.6 kB view details)

Uploaded CPython 3.7m Windows x86-64

mwparserfromhell-0.5.3-cp37-cp37m-win32.whl (93.8 kB view details)

Uploaded CPython 3.7m Windows x86

mwparserfromhell-0.5.3-cp36-cp36m-win_amd64.whl (97.6 kB view details)

Uploaded CPython 3.6m Windows x86-64

mwparserfromhell-0.5.3-cp36-cp36m-win32.whl (93.8 kB view details)

Uploaded CPython 3.6m Windows x86

mwparserfromhell-0.5.3-cp35-cp35m-win_amd64.whl (97.6 kB view details)

Uploaded CPython 3.5m Windows x86-64

mwparserfromhell-0.5.3-cp35-cp35m-win32.whl (93.8 kB view details)

Uploaded CPython 3.5m Windows x86

mwparserfromhell-0.5.3-cp34-cp34m-win_amd64.whl (93.6 kB view details)

Uploaded CPython 3.4m Windows x86-64

mwparserfromhell-0.5.3-cp34-cp34m-win32.whl (91.6 kB view details)

Uploaded CPython 3.4m Windows x86

mwparserfromhell-0.5.3-cp27-cp27m-win_amd64.whl (93.3 kB view details)

Uploaded CPython 2.7m Windows x86-64

mwparserfromhell-0.5.3-cp27-cp27m-win32.whl (91.0 kB view details)

Uploaded CPython 2.7m Windows x86

File details

Details for the file mwparserfromhell-0.5.3.tar.gz.

File metadata

  • Download URL: mwparserfromhell-0.5.3.tar.gz
  • Upload date:
  • Size: 132.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.7.0

File hashes

Hashes for mwparserfromhell-0.5.3.tar.gz
Algorithm Hash digest
SHA256 9c4196d541a6924c3fd08ed29119d4a767116ec77a4c0eaef452726530bc6725
MD5 2ae2a4f45d469d3ceaff227f5901177e
BLAKE2b-256 69a25d537c16e761db25592b2e69c0071aee475f200ef231ee5ca1a499ae54bc

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.3-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.3-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 97.6 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for mwparserfromhell-0.5.3-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 9d08c2bc88eb28e12d4a25ae2099701e96847acee9045a07ecd41b45ecfb577a
MD5 2c948e5751bc86aae1dcb25fd9151fd5
BLAKE2b-256 208a0db41e940593786c35dc1a884b273aaa97022ff2e3202c8fd48065f87fd1

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.3-cp37-cp37m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.3-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 93.8 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for mwparserfromhell-0.5.3-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 3fcdb15301028d2d95cbd0e49fbfbdb60e5a35ba846dfc8d9af796322c233c62
MD5 eed23f44cc69466898bee7699c183a1e
BLAKE2b-256 13d62691198cb182eb45bee4d5df999f5560d62a85b249646548899ea5795c42

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.3-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.3-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 97.6 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for mwparserfromhell-0.5.3-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 a5bca9c54575672d6b78a3c9bf847fd6d21837b6cd38bcbb16d9a68a74d6e781
MD5 ed58e5978643ae3218f979f3b1af62a2
BLAKE2b-256 1c1182fd12bb73e5d34296decce56eea54a4ecd5c4bf495c8556a91f716eac2f

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.3-cp36-cp36m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.3-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 93.8 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for mwparserfromhell-0.5.3-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 3df18758bdbf4f71ceae19b24ed8c91dfe377bcf475a30036147a23525f4d5f2
MD5 a5f66a5b8fdb81b1e3928abf9b217d43
BLAKE2b-256 96d453b1d2df2b7a38484d9ca011adbc6f982eb609526fe221c2366fdd61ce55

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.3-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.3-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 97.6 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/28.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.4

File hashes

Hashes for mwparserfromhell-0.5.3-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 d236a48b3026abe303a2b86b6b340adf53f6efb3a9bdf3b145d5f3d3977ee296
MD5 fa9183c8f03de3c3abd341cda0cd2453
BLAKE2b-256 9290ff643ea4e417a1a14c037fd55c4a01fac6e67ffc41823e14053bc28b11aa

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.3-cp35-cp35m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.3-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 93.8 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/28.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.4

File hashes

Hashes for mwparserfromhell-0.5.3-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 93fa2fd12cefb13c5efa90e4cfeab6b42b8e7701aff84aa5b9c6c18967647e6f
MD5 c02d589e4bf9596b8b6128af8aabc6fa
BLAKE2b-256 181d528610758775539cd606276c5126e40f8c1ad1100816ada1239ef799a474

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.3-cp34-cp34m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.3-cp34-cp34m-win_amd64.whl
  • Upload date:
  • Size: 93.6 kB
  • Tags: CPython 3.4m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/18.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.4.4

File hashes

Hashes for mwparserfromhell-0.5.3-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 cf6c435ce2dd9e5decfeb1507b32801217399769b19a7f5ed26991e36d8ea50a
MD5 0d7eac27d58284081b1fc3eb2758807c
BLAKE2b-256 fe0b99b86c28d49a85794a9cbcec9cc112e96e65f00ad731e34aa0eaf193e841

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.3-cp34-cp34m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.3-cp34-cp34m-win32.whl
  • Upload date:
  • Size: 91.6 kB
  • Tags: CPython 3.4m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/18.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.4.4

File hashes

Hashes for mwparserfromhell-0.5.3-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 5e8237e916b1f8f53f85cea94e5c16e83c90ee1f573e6573d0d883af94f50e75
MD5 0fd02b112f609e25ee9df8675a21b3f9
BLAKE2b-256 93d3c3f896fa75e83b603c2eaa2f80a07166f28446d9f0cd252079f2774724b3

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.3-cp27-cp27m-win_amd64.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.3-cp27-cp27m-win_amd64.whl
  • Upload date:
  • Size: 93.3 kB
  • Tags: CPython 2.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for mwparserfromhell-0.5.3-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 2885b073a58d97427960752b44ee9d3cb72859d5490d635d93a6b0fb468ca9d8
MD5 336f4a062f52e76477a0c0288e21cd9a
BLAKE2b-256 2acc10e92c3b3f5013d2fc4bd9cc1d6798bda1ef7946292ed8a77591a69cd3ee

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.5.3-cp27-cp27m-win32.whl.

File metadata

  • Download URL: mwparserfromhell-0.5.3-cp27-cp27m-win32.whl
  • Upload date:
  • Size: 91.0 kB
  • Tags: CPython 2.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for mwparserfromhell-0.5.3-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 e4082688c80459f4cca18134661d7fbf63739c61bd20391e4d3bfa5a906b941c
MD5 480b666041fb334ea856cd6cc20cd2ac
BLAKE2b-256 f996bd9065b75d64eaa2dca51fe576e3100f6a2e7717c4f5a76c5368a3eade20

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page