A parser for wikipedia citation templates
Project description
This Python library wraps Wikipedia’s citation processing code (written in Lua) to parse citation templates. For instance, there are many different ways to specify the authors of a citation: this codes maps all of them to the same representation.
This is the underlying code of our “OAbot” (GitHub).
Distributed under the MIT license.
Dependencies: lupa, mwparserfromhell To install lupa from pypi, you will need libpython and liblua (libpython-dev and liblua5.2-dev on debian-based systems).
Example
Let’s parse the reference section of this article:
import mwparserfromhell from wikiciteparser.parser import parse_citation_template mwtext = """ ===Articles=== * {{Citation | last1=Lambek | first1=Joachim | author1-link=Joachim Lambek | last2=Moser | first2=L. | title=Inverse and Complementary Sequences of Natural Numbers| doi=10.2307/2308078 | mr=0062777 | journal=[[American Mathematical Monthly|The American Mathematical Monthly]] | issn=0002-9890 | volume=61 | issue=7 | pages=454–458 | year=1954 | jstor=2308078 | publisher=The American Mathematical Monthly, Vol. 61, No. 7}} * {{Citation | last1=Lambek | first1=J. | author1-link=Joachim Lambek | title=The Mathematics of Sentence Structure | year=1958 | journal=[[American Mathematical Monthly|The American Mathematical Monthly]] | issn=0002-9890 | volume=65 | pages=154–170 | doi=10.2307/2310058 | issue=3 | publisher=The American Mathematical Monthly, Vol. 65, No. 3 | jstor=1480361}} *{{Citation | last1=Lambek | first1=Joachim | author1-link=Joachim Lambek | title=Bicommutators of nice injectives | doi=10.1016/0021-8693(72)90034-8 | mr=0301052 | year=1972 | journal=Journal of Algebra | issn=0021-8693 | volume=21 | pages=60–73}} *{{Citation | last1=Lambek | first1=Joachim | author1-link=Joachim Lambek | title=Localization and completion | doi=10.1016/0022-4049(72)90011-4 | mr=0320047 | year=1972 | journal=Journal of Pure and Applied Algebra | issn=0022-4049 | volume=2 | pages=343–370 | issue=4}} *{{Citation | last1=Lambek | first1=Joachim | author1-link=Joachim Lambek | title=A mathematician looks at Latin conjugation | mr=589163 | year=1979 | journal=Theoretical Linguistics | issn=0301-4428 | volume=6 | issue=2 | pages=221–234 | doi=10.1515/thli.1979.6.1-3.221}} """ wikicode = mwparserfromhell.parse(mwtext) for tpl in wikicode.filter_templates(): parsed = parse_citation_template(tpl) print(parsed)
Here is what you get:
{u'PublisherName': u'The American Mathematical Monthly, Vol. 61, No. 7', u'Title': u'Inverse and Complementary Sequences of Natural Numbers', u'ID_list': {u'DOI': u'10.2307/2308078', u'ISSN': u'0002-9890', u'MR': u'0062777', u'JSTOR': u'2308078'}, u'Periodical': u'The American Mathematical Monthly', u'Authors': [{u'link': u'Joachim Lambek', u'last': u'Lambek', u'first': u'Joachim'}, {u'last': u'Moser', u'first': u'L.'}], u'Date': u'1954', u'Pages': u'454-458'} {u'PublisherName': u'The American Mathematical Monthly, Vol. 65, No. 3', u'Title': u'The Mathematics of Sentence Structure', u'ID_list': {u'DOI': u'10.2307/2310058', u'ISSN': u'0002-9890', u'JSTOR': u'1480361'}, u'Periodical': u'The American Mathematical Monthly', u'Authors': [{u'link': u'Joachim Lambek', u'last': u'Lambek', u'first': u'J.'}], u'Date': u'1958', u'Pages': u'154-170'} {u'Title': u'Bicommutators of nice injectives', u'ID_list': {u'DOI': u'10.1016/0021-8693(72)90034-8', u'ISSN': u'0021-8693', u'MR': u'0301052'}, u'Periodical': u'Journal of Algebra', u'Authors': [{u'link': u'Joachim Lambek', u'last': u'Lambek', u'first': u'Joachim'}], u'Date': u'1972', u'Pages': u'60-73'} {u'Title': u'Localization and completion', u'ID_list': {u'DOI': u'10.1016/0022-4049(72)90011-4', u'ISSN': u'0022-4049', u'MR': u'0320047'}, u'Periodical': u'Journal of Pure and Applied Algebra', u'Authors': [{u'link': u'Joachim Lambek', u'last': u'Lambek', u'first': u'Joachim'}], u'Date': u'1972', u'Pages': u'343-370'} {u'Title': u'A mathematician looks at Latin conjugation', u'ID_list': {u'DOI': u'10.1515/thli.1979.6.1-3.221', u'ISSN': u'0301-4428', u'MR': u'589163'}, u'Periodical': u'Theoretical Linguistics', u'Authors': [{u'link': u'Joachim Lambek', u'last': u'Lambek', u'first': u'Joachim'}], u'Date': u'1979', u'Pages': u'221-234'}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file wikiciteparser-0.2.1.tar.gz
.
File metadata
- Download URL: wikiciteparser-0.2.1.tar.gz
- Upload date:
- Size: 78.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.53.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f0e8cc82d7334a37d925bea894caf71cbb11098e666b1a3fdde6514f1c9d9d5 |
|
MD5 | 5d79bc374f3938ba1d56aa0183c2806c |
|
BLAKE2b-256 | bc0175d6e8b83530899a263f8402adf99a587b313b604d9d679b0e20551d0b55 |
File details
Details for the file wikiciteparser-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: wikiciteparser-0.2.1-py3-none-any.whl
- Upload date:
- Size: 77.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.53.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72e37df32fb1a7328e039404758aaeffac33016befa917526d85dae8f4f92392 |
|
MD5 | 8a1f2a296b5fee88dfed484ffaf22495 |
|
BLAKE2b-256 | 130dab320fe5b73cf4d956c5747ca756720a12e933b2e77d5172c9fb0015526c |