Skip to main content

A set of utilities for generating quality scores for MediaWiki revisions

Project description

Revision Scoring

A generic, machine learning-based revision scoring system designed to be used to automatically differentiate damage from productive contributory behavior on Wikipedia.

Examples

Using a scorer_model to score a revision:

>>> import mwapi
>>> from revscoring import ScorerModel
>>> from revscoring.extractors import APIExtractor
>>>
>>> with open("models/enwiki.damaging.linear_svc.model") as f:
...     scorer_model = ScorerModel.load(f)
...
>>> extractor = APIExtractor(mwapi.Session(host="https://en.wikipedia.org",
...                                        user_agent="revscoring demo"))
>>>
>>> feature_values = extractor.extract(123456789, scorer_model.features)
>>>
>>> print(scorer_model.score(feature_values))
{'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}

Installation

The easiest way to install revscoring is via the Python package installer (pip).

pip install revscoring

You may find that some of revscorings dependencies fail to compile (namely scipy, numpy and sklearn). In that case, you’ll need to install some dependencies in your operating system.

Ubuntu & Debian:

Run sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev

Windows:

‘TODO’

MacOS:

‘TODO’

Finally, in order to make use of language features, you’ll need to download some NLTK data. The following command will get the necessary corpus.

python -m nltk.downloader stopwords

You’ll also need to install [enchant](https://enchant.org) compatible dictionaries of the languages you’d like to use. We recommend the following:

  • languages.english: myspell-en-us myspell-en-gb myspell-en-au

  • languages.french: myspell-fr

  • languages.spanish: myspell-es

  • languages.vietnamese: hunspell-vi

  • languages.hebrew: myspell-he

  • languages.portuguese: myspell-pt

  • languages.persian: myspell-fa

Authors

Aaron Halfaker:
  • http://halfaker.info

Helder:
  • https://github.com/he7d3r

Adam Roses Wight:
  • https://mediawiki.org/wiki/User:Adamw

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

revscoring-0.6.7.tar.gz (69.6 kB view details)

Uploaded Source

Built Distribution

revscoring-0.6.7-py2.py3-none-any.whl (166.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file revscoring-0.6.7.tar.gz.

File metadata

  • Download URL: revscoring-0.6.7.tar.gz
  • Upload date:
  • Size: 69.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for revscoring-0.6.7.tar.gz
Algorithm Hash digest
SHA256 b0269e84b6fd7c6f97a350443540b7ae22d4c075a05d230314a9ec9b58c56a90
MD5 49570fb6e3f391e4072ccf9980ba2e99
BLAKE2b-256 9c74bedbbf6dc7eb26378145b8945c569409ae7800dddf53eaaab864670eb2e4

See more details on using hashes here.

File details

Details for the file revscoring-0.6.7-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for revscoring-0.6.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 fbc751a8da0cf4ad58198d414c52c11978e7dc4091297d91db87009af58527c7
MD5 54f9618d85407d8a7647bfb79422b959
BLAKE2b-256 8afc43515dcfba41307463e3ef5ba184a9c2eaed547097caf5dfd21df3b92200

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page