Skip to main content

MWParserFromHell is a parser for MediaWiki wikicode.

Project description

Build Status Coverage Status

mwparserfromhell (the MediaWiki Parser from Hell) is a Python package that provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode. It supports Python 2 and Python 3.

Developed by Earwig with contributions from Σ, Legoktm, and others. Full documentation is available on ReadTheDocs. Development occurs on GitHub.

Installation

The easiest way to install the parser is through the Python Package Index; you can install the latest release with pip install mwparserfromhell (get pip). On Windows, make sure you have the latest version of pip installed by running pip install --upgrade pip.

Alternatively, get the latest development version:

git clone https://github.com/earwig/mwparserfromhell.git
cd mwparserfromhell
python setup.py install

You can run the comprehensive unit testing suite with python setup.py test -q.

Usage

Normal usage is rather straightforward (where text is page text):

>>> import mwparserfromhell
>>> wikicode = mwparserfromhell.parse(text)

wikicode is a mwparserfromhell.Wikicode object, which acts like an ordinary str object (or unicode in Python 2) with some extra methods. For example:

>>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
>>> wikicode = mwparserfromhell.parse(text)
>>> print(wikicode)
I has a template! {{foo|bar|baz|eggs=spam}} See it?
>>> templates = wikicode.filter_templates()
>>> print(templates)
['{{foo|bar|baz|eggs=spam}}']
>>> template = templates[0]
>>> print(template.name)
foo
>>> print(template.params)
['bar', 'baz', 'eggs=spam']
>>> print(template.get(1).value)
bar
>>> print(template.get("eggs").value)
spam

Since nodes can contain other nodes, getting nested templates is trivial:

>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
>>> mwparserfromhell.parse(text).filter_templates()
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']

You can also pass recursive=False to filter_templates() and explore templates manually. This is possible because nodes can contain additional Wikicode objects:

>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
>>> print(code.filter_templates(recursive=False))
['{{foo|this {{includes a|template}}}}']
>>> foo = code.filter_templates(recursive=False)[0]
>>> print(foo.get(1).value)
this {{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0])
{{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0].get(1).value)
template

Templates can be easily modified to add, remove, or alter params. Wikicode objects can be treated like lists, with append(), insert(), remove(), replace(), and more. They also have a matches() method for comparing page or template names, which takes care of capitalization and whitespace:

>>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
>>> code = mwparserfromhell.parse(text)
>>> for template in code.filter_templates():
...     if template.name.matches("Cleanup") and not template.has("date"):
...         template.add("date", "July 2012")
...
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
>>> code.replace("{{uncategorized}}", "{{bar-stub}}")
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> print(code.filter_templates())
['{{cleanup|date=July 2012}}', '{{bar-stub}}']

You can then convert code back into a regular str object (for saving the page!) by calling str() on it:

>>> text = str(code)
>>> print(text)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> text == code
True

Likewise, use unicode(code) in Python 2.

Integration

mwparserfromhell is used by and originally developed for EarwigBot; Page objects have a parse method that essentially calls mwparserfromhell.parse() on page.get().

If you’re using Pywikibot, your code might look like this:

import mwparserfromhell
import pywikibot

def parse(title):
    site = pywikibot.Site()
    page = pywikibot.Page(site, title)
    text = page.get()
    return mwparserfromhell.parse(text)

If you’re not using a library, you can parse any page using the following code (via the API):

import json
from urllib.parse import urlencode
from urllib.request import urlopen
import mwparserfromhell
API_URL = "https://en.wikipedia.org/w/api.php"

def parse(title):
    data = {"action": "query", "prop": "revisions", "rvlimit": 1,
            "rvprop": "content", "format": "json", "titles": title}
    raw = urlopen(API_URL, urlencode(data).encode()).read()
    res = json.loads(raw)
    text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
    return mwparserfromhell.parse(text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mwparserfromhell-0.4.3.tar.gz (117.8 kB view details)

Uploaded Source

Built Distributions

mwparserfromhell-0.4.3-cp35-none-win_amd64.whl (96.2 kB view details)

Uploaded CPython 3.5 Windows x86-64

mwparserfromhell-0.4.3-cp35-none-win32.whl (92.7 kB view details)

Uploaded CPython 3.5 Windows x86

mwparserfromhell-0.4.3-cp34-none-win_amd64.whl (91.5 kB view details)

Uploaded CPython 3.4 Windows x86-64

mwparserfromhell-0.4.3-cp34-none-win32.whl (89.8 kB view details)

Uploaded CPython 3.4 Windows x86

mwparserfromhell-0.4.3-cp33-none-win_amd64.whl (91.6 kB view details)

Uploaded CPython 3.3 Windows x86-64

mwparserfromhell-0.4.3-cp33-none-win32.whl (89.8 kB view details)

Uploaded CPython 3.3 Windows x86

mwparserfromhell-0.4.3-cp27-none-win_amd64.whl (91.3 kB view details)

Uploaded CPython 2.7 Windows x86-64

mwparserfromhell-0.4.3-cp27-none-win32.whl (89.3 kB view details)

Uploaded CPython 2.7 Windows x86

File details

Details for the file mwparserfromhell-0.4.3.tar.gz.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.3.tar.gz
Algorithm Hash digest
SHA256 1eb243e8c1cd84e7894aeda7abfc905acd6c560fc31c2009800a664a0318d91c
MD5 93eff7832306559b679974ad621a314b
BLAKE2b-256 23a27100d10a584b149c3a8da7704111da18f3a1a8cc128be85dbe7fbc1f5415

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.3-cp35-none-win_amd64.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.3-cp35-none-win_amd64.whl
Algorithm Hash digest
SHA256 ea4197e4519d609ee0a72bcc166ba3aeb09c6eff25f589a2e2f08bfe4d5be184
MD5 fc331e5a972d217573e773a4d9514e44
BLAKE2b-256 fe18f272528200359fbe3ebad62e0651a5cdc8b1c62af5c6ce00a91803ef1956

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.3-cp35-none-win32.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.3-cp35-none-win32.whl
Algorithm Hash digest
SHA256 8ce9abde7624515ad35011b0aa8843cdf28f84a77d950775f950b13adcc59fea
MD5 104168db5c95245fcf5fde718e43557d
BLAKE2b-256 36248f80e08ec8876c2fc763e620b74f7dccfd46375df0c4f5a06c31fd728ab7

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.3-cp34-none-win_amd64.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.3-cp34-none-win_amd64.whl
Algorithm Hash digest
SHA256 207dc1a730a8f58ba58d2eea43f34abbdcc7913d169b26c17855012eece35097
MD5 994ba154e04c91a33b2e218fbae466c9
BLAKE2b-256 29b2fa68206c52fe1ebc9ceb7c0d5b577165d3092da2bb20358edf36af792113

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.3-cp34-none-win32.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.3-cp34-none-win32.whl
Algorithm Hash digest
SHA256 39970eab2b9bc2655c8d639443e6ba9a97734d9ba832e5048af0b4a9bf2d8d10
MD5 5de7a90752c57516591ec19e0f0147cf
BLAKE2b-256 804995843101bbe18ba9074acd0400033f1f6adb8b9c305dad4e882e730044e1

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.3-cp33-none-win_amd64.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.3-cp33-none-win_amd64.whl
Algorithm Hash digest
SHA256 f4d8cd8141671236c238c648a05f7a2edff1ee0d4366c0c5c2899c9769ceeaee
MD5 2850b8921b85cd973f96ddcda92fce89
BLAKE2b-256 998e297ef46d8cf3fb0b2c5e21e0aa8ca75ebc031f8a78f72099a862e2f1f92e

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.3-cp33-none-win32.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.3-cp33-none-win32.whl
Algorithm Hash digest
SHA256 881708c039868afd902c6d5e7cad4931f4b6e8d4144c69d658800a5e6a741659
MD5 b6ac1e4dc57803156d22e86823ab519a
BLAKE2b-256 d9f0c54393b1f1621acfd4f20f027785aac49d2af88c92d13637437726d942f3

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.3-cp27-none-win_amd64.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.3-cp27-none-win_amd64.whl
Algorithm Hash digest
SHA256 abf1552a6565d0c3cd58f0f6c9da9446acc5736bb0e21142219cde0579056e86
MD5 2f13a3c027a5bb976cc5c543a52fbc50
BLAKE2b-256 df5b1596445f02ba81d3a8a72208acc0ea03010f618c13c1033b47720d02aef3

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.3-cp27-none-win32.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.3-cp27-none-win32.whl
Algorithm Hash digest
SHA256 a3904f34f9c49e2f666014761ea5f0b9fb00beffc0d96fcef38f2ec045915aac
MD5 65d77b231bac776302097da9004d12b6
BLAKE2b-256 3510e55f2fad611f88902d163d55d67511dca81b45bf1821c24d450497491d72

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page