Skip to main content

MWParserFromHell is a parser for MediaWiki wikicode.

Project description

Build Status Coverage Status

mwparserfromhell (the MediaWiki Parser from Hell) is a Python package that provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode. It supports Python 2 and Python 3.

Developed by Earwig with contributions from Σ, Legoktm, and others. Full documentation is available on ReadTheDocs. Development occurs on GitHub.

Installation

The easiest way to install the parser is through the Python Package Index; you can install the latest release with pip install mwparserfromhell (get pip). On Windows, make sure you have the latest version of pip installed by running pip install --upgrade pip.

Alternatively, get the latest development version:

git clone https://github.com/earwig/mwparserfromhell.git
cd mwparserfromhell
python setup.py install

You can run the comprehensive unit testing suite with python setup.py test -q.

Usage

Normal usage is rather straightforward (where text is page text):

>>> import mwparserfromhell
>>> wikicode = mwparserfromhell.parse(text)

wikicode is a mwparserfromhell.Wikicode object, which acts like an ordinary str object (or unicode in Python 2) with some extra methods. For example:

>>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
>>> wikicode = mwparserfromhell.parse(text)
>>> print(wikicode)
I has a template! {{foo|bar|baz|eggs=spam}} See it?
>>> templates = wikicode.filter_templates()
>>> print(templates)
['{{foo|bar|baz|eggs=spam}}']
>>> template = templates[0]
>>> print(template.name)
foo
>>> print(template.params)
['bar', 'baz', 'eggs=spam']
>>> print(template.get(1).value)
bar
>>> print(template.get("eggs").value)
spam

Since nodes can contain other nodes, getting nested templates is trivial:

>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
>>> mwparserfromhell.parse(text).filter_templates()
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']

You can also pass recursive=False to filter_templates() and explore templates manually. This is possible because nodes can contain additional Wikicode objects:

>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
>>> print(code.filter_templates(recursive=False))
['{{foo|this {{includes a|template}}}}']
>>> foo = code.filter_templates(recursive=False)[0]
>>> print(foo.get(1).value)
this {{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0])
{{includes a|template}}
>>> print(foo.get(1).value.filter_templates()[0].get(1).value)
template

Templates can be easily modified to add, remove, or alter params. Wikicode objects can be treated like lists, with append(), insert(), remove(), replace(), and more. They also have a matches() method for comparing page or template names, which takes care of capitalization and whitespace:

>>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
>>> code = mwparserfromhell.parse(text)
>>> for template in code.filter_templates():
...     if template.name.matches("Cleanup") and not template.has("date"):
...         template.add("date", "July 2012")
...
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
>>> code.replace("{{uncategorized}}", "{{bar-stub}}")
>>> print(code)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> print(code.filter_templates())
['{{cleanup|date=July 2012}}', '{{bar-stub}}']

You can then convert code back into a regular str object (for saving the page!) by calling str() on it:

>>> text = str(code)
>>> print(text)
{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
>>> text == code
True

Likewise, use unicode(code) in Python 2.

Integration

mwparserfromhell is used by and originally developed for EarwigBot; Page objects have a parse method that essentially calls mwparserfromhell.parse() on page.get().

If you’re using Pywikibot, your code might look like this:

import mwparserfromhell
import pywikibot

def parse(title):
    site = pywikibot.Site()
    page = pywikibot.Page(site, title)
    text = page.get()
    return mwparserfromhell.parse(text)

If you’re not using a library, you can parse any page using the following code (via the API):

import json
from urllib.parse import urlencode
from urllib.request import urlopen
import mwparserfromhell
API_URL = "https://en.wikipedia.org/w/api.php"

def parse(title):
    data = {"action": "query", "prop": "revisions", "rvlimit": 1,
            "rvprop": "content", "format": "json", "titles": title}
    raw = urlopen(API_URL, urlencode(data).encode()).read()
    res = json.loads(raw)
    text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
    return mwparserfromhell.parse(text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mwparserfromhell-0.4.2.tar.gz (116.4 kB view details)

Uploaded Source

Built Distributions

mwparserfromhell-0.4.2-cp34-none-win_amd64.whl (90.3 kB view details)

Uploaded CPython 3.4 Windows x86-64

mwparserfromhell-0.4.2-cp34-none-win32.whl (88.9 kB view details)

Uploaded CPython 3.4 Windows x86

mwparserfromhell-0.4.2-cp33-none-win_amd64.whl (90.4 kB view details)

Uploaded CPython 3.3 Windows x86-64

mwparserfromhell-0.4.2-cp33-none-win32.whl (88.9 kB view details)

Uploaded CPython 3.3 Windows x86

mwparserfromhell-0.4.2-cp27-none-win_amd64.whl (90.1 kB view details)

Uploaded CPython 2.7 Windows x86-64

mwparserfromhell-0.4.2-cp27-none-win32.whl (88.5 kB view details)

Uploaded CPython 2.7 Windows x86

File details

Details for the file mwparserfromhell-0.4.2.tar.gz.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.2.tar.gz
Algorithm Hash digest
SHA256 6aa77be28882fd64f16a0a534973ee0e27bc8d109e804870489704ee3af46038
MD5 5ec5af6376df44a692a927a3d0e8370a
BLAKE2b-256 4a0a66a01455c8d100e78af0734215dd8b5c72cb76df18b698feead7c9fb77b5

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.2-cp34-none-win_amd64.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.2-cp34-none-win_amd64.whl
Algorithm Hash digest
SHA256 8ad04004887362576bcffa00f1d7c096e4f89d93bcd8078e0a9a00323de20390
MD5 70a7d4f28aa1e12c32c6c00713d527a1
BLAKE2b-256 9e1b42015b7f590792f2335a2623a419871c0ebfde49dbb0adc56f6e79856995

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.2-cp34-none-win32.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.2-cp34-none-win32.whl
Algorithm Hash digest
SHA256 45708aca03fcb9ca82b1158578b4470b3e3cd38ceb7f38155dc015e7290bc230
MD5 cc4d5c71cb30d399f014593d876aeb8d
BLAKE2b-256 22e85c1d2eae292fe303972d9b4c45eca94ca706c0686286f404fd32c937f1aa

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.2-cp33-none-win_amd64.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.2-cp33-none-win_amd64.whl
Algorithm Hash digest
SHA256 d8d42545a9ef23140dcb262f321070d1cbd818f808cb88b3f0eb6daf7bbb8b4a
MD5 88a8c795f12c2686143c04576eb78da6
BLAKE2b-256 1f341b0f8a60be3f7b8e740aa96cb4a591effb4ef616b151750dca59fc2eac84

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.2-cp33-none-win32.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.2-cp33-none-win32.whl
Algorithm Hash digest
SHA256 543330985ae7a2c708d207931ab77507d460b132a6d0fdcbcd62b7c56477b914
MD5 4cd9ae66efa8513b3ef30680a8339aec
BLAKE2b-256 3eba918050f539d84a22f7975de817e93b645e8448a5b7349049e6f5d5d51f3a

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.2-cp27-none-win_amd64.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.2-cp27-none-win_amd64.whl
Algorithm Hash digest
SHA256 65e7752d48c5a72e4bacc9f501ae2f257b914594327c00ac7335706349aa6d5d
MD5 92630f0fb3694d24b054284a0ca4825a
BLAKE2b-256 6d29b2fa7ef0aa15c6dd9c50d8dfb09ab314283b832713ea0348c487728db99f

See more details on using hashes here.

File details

Details for the file mwparserfromhell-0.4.2-cp27-none-win32.whl.

File metadata

File hashes

Hashes for mwparserfromhell-0.4.2-cp27-none-win32.whl
Algorithm Hash digest
SHA256 99f26450ab0abda4a6a3fd01aca3bf2c64d8249e17d7ce30e0b0599802db8a59
MD5 125f949b8d140c28af8eb7325023d88f
BLAKE2b-256 f54f0d8b3900d645f66b759a3ee2cf790074f5aaea26b46e99f173e24403ceb9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page