Skip to main content

MWParserFromHell is a parser for MediaWiki wikicode.

Project description

Build Status Coverage Status

mwparserfromhell (the MediaWiki Parser from Hell) is a Python package that provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode. It supports Python 2 and Python 3.

Developed by Earwig with contributions from Σ, Legoktm, and others. Full documentation is available on ReadTheDocs. Development occurs on GitHub.

Installation

The easiest way to install the parser is through the Python Package Index; you can install the latest release with pip install mwparserfromhell (get pip). Make sure your pip is up-to-date first, especially on Windows.

Alternatively, get the latest development version:

git clone https://github.com/earwig/mwparserfromhell.git
cd mwparserfromhell
python setup.py install

You can run the comprehensive unit testing suite with python setup.py test -q.

Usage

Normal usage is rather straightforward (where text is page text):

>>> import mwparserfromhell
>>> wikicode = mwparserfromhell.parse(text)

wikicode is a mwparserfromhell.Wikicode object, which acts like an ordinary str object (or unicode in Python 2) with some extra methods. For example:

>>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
>>> wikicode = mwparserfromhell.parse(text)
>>> print(wikicode)
I has a template! {{foo|bar|baz|eggs=spam}} See it?
>>> templates = wikicode.filter_templates()
>>> print(templates)
['{{foo|bar|baz|eggs=spam}}']
>>> template = templates[0]
>>> print(template.name)
foo
>>> print(template.params)
['bar', 'baz', 'eggs=spam']
>>> print(template.get(1).value)
bar
>>> print(template.get("eggs").value)
spam

Since nodes can contain other nodes, getting nested templates is trivial:

>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
>>> mwparserfromhell.parse(text).filter_templates()
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']

You can also pass recursive=False to filter_templates() and explore templates manually. This is possible because nodes can contain additional Wikicode objects:

>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
>>> print(code.filter_templates(recursive=False))
['{{foo|this {{includes a|template}}}}']
>>> foo = code.filter_templates(recursive=False)[0]
>>> print(foo.get(1).value)
this {{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0])
{{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0].get(1).value)
template

Templates can be easily modified to add, remove, or alter params. Wikicode objects can be treated like lists, with append(), insert(), remove(), replace(), and more. They also have a matches() method for comparing page or template names, which takes care of capitalization and whitespace:

>>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
>>> code = mwparserfromhell.parse(text)
>>> for template in code.filter_templates():
...     if template.name.matches("Cleanup") and not template.has("date"):
...         template.add("date", "July 2012")
...
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
>>> code.replace("{{uncategorized}}", "{{bar-stub}}")
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> print(code.filter_templates())
['{{cleanup|date=July 2012}}', '{{bar-stub}}']

You can then convert code back into a regular str object (for saving the page!) by calling str() on it:

>>> text = str(code)
>>> print(text)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> text == code
True

Likewise, use unicode(code) in Python 2.

Caveats

An inherent limitation in wikicode prevents us from generating complete parse trees in certain cases. For example, the string {{echo|''Hello}}, world!'' produces the valid output <i>Hello, world!</i> in MediaWiki, assuming {{echo}} is a template that returns its first parameter. But since representing this in mwparserfromhell’s node tree would be impossible, we compromise by treating the first node (i.e., the template) as plain text, parsing only the italics.

The current workaround for cases where you are not interested in text formatting is to pass skip_style_tags=True to mwparserfromhell.parse(). This treats '' and ''' like plain text.

A future version of mwparserfromhell will include multiple parsing modes to get around this restriction.

Integration

mwparserfromhell is used by and originally developed for EarwigBot; Page objects have a parse method that essentially calls mwparserfromhell.parse() on page.get().

If you’re using Pywikibot, your code might look like this:

import mwparserfromhell
import pywikibot

def parse(title):
    site = pywikibot.Site()
    page = pywikibot.Page(site, title)
    text = page.get()
    return mwparserfromhell.parse(text)

If you’re not using a library, you can parse any page using the following Python 3 code (via the API):

import json
from urllib.parse import urlencode
from urllib.request import urlopen
import mwparserfromhell
API_URL = "https://en.wikipedia.org/w/api.php"

def parse(title):
    data = {"action": "query", "prop": "revisions", "rvlimit": 1,
            "rvprop": "content", "format": "json", "titles": title}
    raw = urlopen(API_URL, urlencode(data).encode()).read()
    res = json.loads(raw)
    text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
    return mwparserfromhell.parse(text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mwparserfromhell-0.4.4.tar.gz (118.4 kB view details)

Uploaded Source

Built Distributions

mwparserfromhell-0.4.4-cp35-cp35m-win_amd64.whl (96.9 kB view details)

Uploaded CPython 3.5m Windows x86-64

mwparserfromhell-0.4.4-cp35-cp35m-win32.whl (93.5 kB view details)

Uploaded CPython 3.5m Windows x86

mwparserfromhell-0.4.4-cp34-cp34m-win_amd64.whl (92.3 kB view details)

Uploaded CPython 3.4m Windows x86-64

mwparserfromhell-0.4.4-cp34-cp34m-win32.whl (90.5 kB view details)

Uploaded CPython 3.4m Windows x86

mwparserfromhell-0.4.4-cp33-cp33m-win_amd64.whl (92.3 kB view details)

Uploaded CPython 3.3m Windows x86-64

mwparserfromhell-0.4.4-cp33-cp33m-win32.whl (90.5 kB view details)

Uploaded CPython 3.3m Windows x86

mwparserfromhell-0.4.4-cp27-cp27m-win_amd64.whl (92.0 kB view details)

Uploaded CPython 2.7m Windows x86-64

mwparserfromhell-0.4.4-cp27-cp27m-win32.whl (90.1 kB view details)

Uploaded CPython 2.7m Windows x86

File details

Details for the file mwparserfromhell-0.4.4.tar.gz.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.4.tar.gz
Algorithm Hash digest
SHA256 e15079b832e47b076e811ee3cce391f7bf116fe8289a97ea1181577331e3e6b4
MD5 cd967b7d3c80f387e09bc5e0c915d956
BLAKE2b-256 e0a7d0162476b156b1bd7eda8295233d0e8d86aa8ca929c79302401fc13aeca4

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.4-cp35-cp35m-win_amd64.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.4-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 cda5a59d9d07a9b7d1b3d5b696affa743b2894889dbb112ff417205b1baf9200
MD5 396e5e312891bc631b540a85a554c7d3
BLAKE2b-256 607cd7c5d7beccef4ab95d0ac5b1f14bb4bcd4cbae02a443d65ecc3de3bb3794

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.4-cp35-cp35m-win32.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.4-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 a6916c8fc1443322b344409355c009cf1d72164a375d864d3ebdec897ba58d14
MD5 0c5ba1549c0da082378e486e6130af8a
BLAKE2b-256 7da3b3f7ba4518ec7a36ffa381daa73611031b9bea6a46165fd5af278adba841

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.4-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.4-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 a6e8e20668c3603c672f33bec644e9a1d7a5b561c2ff1bd8963c2cdb206692b6
MD5 a07664216b328ba67b3c3dad4711283c
BLAKE2b-256 db0f0bd715c30e3de2330d94806f8d77bb9646238b1c970e8b0a6b3428dd7aad

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.4-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.4-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 b28983e3c831ac9bc9ab29c61b39ce47238fd611ce705cc04310fc60f8bf05a7
MD5 8de5373c3221805804dc82152600c2cf
BLAKE2b-256 bf04983ffdf6ef6cf4a0144c923a85077b46b463e7cbbc87ccf43da7fc9252cb

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.4-cp33-cp33m-win_amd64.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.4-cp33-cp33m-win_amd64.whl
Algorithm Hash digest
SHA256 db60ddb8d09c4473358d025d5a80eae07f95af96c77223176fc35906e9e8d8d1
MD5 5ea367405824f2b32f4c8be31cf16f80
BLAKE2b-256 5bb43c4fb2ae0ab97554b2c44853c939e8b5f4fbc1780685c522a6fb3f062231

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.4-cp33-cp33m-win32.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.4-cp33-cp33m-win32.whl
Algorithm Hash digest
SHA256 eb959be6d9cd8359fb78252575fbddc3e8f995716be98eba80d8f327d0dcc2e3
MD5 f0dd6bbb1e471d71d737b22d9b9fea26
BLAKE2b-256 1211962c78ef6ff8649be47ed6ee1d04816640ebc7ac2572eeb4024cf30388a2

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.4-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.4-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 02a5e9a9490359e73fbcbdd0c7825fb7fb484c8e41ff6ac7fc8657f665940a1e
MD5 49586f405131f307b515c9214ac72d82
BLAKE2b-256 519f4b9901948d9c0c8ea1a1c81c767e153eadcccad6aeabab4121285d824dec

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.4-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.4-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 b1647fa972cb46b87710536f665b1efa4b0089feba4ecf1a8b485b9637ba6913
MD5 4ef5b43e6962f67ca50a2a9462a09585
BLAKE2b-256 20314ac8705dacfe3929abe858cd11096a87f2b5ad63d6bf18278110e04b17aa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page