Skip to main content

A set of utilities for generating quality scores for MediaWiki revisions

Project description

Revision Scoring

A generic, machine learning-based revision scoring system designed to be used to automatically differentiate damage from productive contributory behavior on Wikipedia.

Example

Using a scorer_model to score a revision:

>>> import mwapi
>>> from revscoring import ScorerModel
>>> from revscoring.extractors import APIExtractor
>>>
>>> with open("models/enwiki.damaging.linear_svc.model") as f:
...     scorer_model = ScorerModel.load(f)
...
>>> extractor = APIExtractor(mwapi.Session(host="https://en.wikipedia.org",
...                                        user_agent="revscoring demo"))
>>>
>>> feature_values = extractor.extract(123456789, scorer_model.features)
>>>
>>> print(scorer_model.score(feature_values))
{'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}

Installation

The easiest way to install revscoring is via the Python package installer (pip).

pip install revscoring

You may find that some of revscorings dependencies fail to compile (namely scipy, numpy and sklearn). In that case, you’ll need to install some dependencies in your operating system.

Ubuntu & Debian:

Run sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev

Windows:

‘TODO’

MacOS:

‘TODO’

Finally, in order to make use of language features, you’ll need to download some NLTK data. The following command will get the necessary corpus.

python -m nltk.downloader stopwords

You’ll also need to install enchant compatible dictionaries of the languages you’d like to use. We recommend the following:

  • languages.dutch: myspell-nl

  • languages.english: myspell-en-us myspell-en-gb myspell-en-au

  • languages.french: myspell-fr

  • languages.german: myspell-de-at myspell-de-ch myspell-de-ch

  • languages.indonesian: aspell-id

  • languages.italian: myspell-it

  • languages.hebrew: myspell-he

  • languages.portuguese: myspell-pt

  • languages.persian: myspell-fa

  • languages.spanish: myspell-es

  • languages.vietnamese: hunspell-vi

Authors

Aaron Halfaker:
  • http://halfaker.info

Helder:
  • https://github.com/he7d3r

Adam Roses Wight:
  • https://mediawiki.org/wiki/User:Adamw

Project details


Release history Release notifications | RSS feed

This version

0.7.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

revscoring-0.7.1.tar.gz (76.0 kB view details)

Uploaded Source

Built Distribution

revscoring-0.7.1-py2.py3-none-any.whl (175.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file revscoring-0.7.1.tar.gz.

File metadata

  • Download URL: revscoring-0.7.1.tar.gz
  • Upload date:
  • Size: 76.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for revscoring-0.7.1.tar.gz
Algorithm Hash digest
SHA256 5e3920810f3d4c7f01a13e4653f1d598cf70b2084be0c9d242676fe5c544f902
MD5 93ff6555f2c70a130ee5186e1704a74a
BLAKE2b-256 10c59f4d4ef2fb06601d404d268c1d1ffe1d8ef8d4678f1dca0f21ea5c1c7f4e

See more details on using hashes here.

File details

Details for the file revscoring-0.7.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for revscoring-0.7.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c76a8db54a17ef7a0344f88ff4c6fa4c816e237d388a3b07ca439e71165401d6
MD5 8b6d10f2d17ff72ecd4ca9d9ed8d1bf8
BLAKE2b-256 d096e1383eaea65ade1ae787ddfd05a24fca9f293d29c4b87d6eff75ac2d4032

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page