Skip to main content

A set of utilities for generating quality scores for MediaWiki revisions

Project description

Revision Scoring

A generic, machine learning-based revision scoring system designed to be used to automatically differentiate damage from productive contributory behavior on Wikipedia.

Examples

Scoring models:

>>> from mw.api import Session
>>>
>>> from revscoring.extractors import APIExtractor
>>> from revscoring.languages import english
>>> from revscoring.scorers import MLScorerModel
>>>
>>> api_session = Session("https://en.wikipedia.org/w/api.php")
Sending requests with default User-Agent.  Set 'user_agent' on api.Session to quiet this message.
>>> extractor = APIExtractor(api_session, english)
>>>
>>> filename = "models/reverts.halfak_mix.trained.model"
>>> model = MLScorerModel.load(open(filename, 'rb'))
>>>
>>> rev_ids = [105, 642215410, 638307884]
>>> feature_values = [extractor.extract(id, model.features) for id in rev_ids]

>>> scores = model.score(feature_values, probabilities=True)
>>> for rev_id, score in zip(rev_ids, scores):
...     print("{0}: {1}".format(rev_id, score))
...
105: {'probabilities': array([ 0.96441465,  0.03558535]), 'prediction': False}
642215410: {'probabilities': array([ 0.75884553,  0.24115447]), 'prediction': True}
638307884: {'probabilities': array([ 0.98441738,  0.01558262]), 'prediction': False}

Feature extraction:

>>> from mw.api import Session
>>>
>>> from revscoring.extractors import APIExtractor
>>> from revscoring.features import diff, parent_revision, revision, user
>>>
>>> api_extractor = APIExtractor(Session("https://en.wikipedia.org/w/api.php"))
Sending requests with default User-Agent.  Set 'user_agent' on api.Session to quiet this message.
>>>
>>> features = [revision.day_of_week,
...             revision.hour_of_day,
...             revision.has_custom_comment,
...             parent_revision.bytes_changed,
...             diff.chars_added,
...             user.age,
...             user.is_anon,
...             user.is_bot]
>>>
>>> values = api_extractor.extract(
...     624577024,
...     features
... )
>>> for feature, value in zip(features, values):
...     print("{0}: {1}".format(feature, value))
...
<revision.day_of_week>: 6
<revision.hour_of_day>: 19
<revision.has_custom_comment>: True
<(revision.bytes - parent_revision.bytes_changed)>: 3
<diff.chars_added>: 8
<user.age>: 71821407
<user.is_anon>: False
<user.is_bot>: False

Installation

Packages

In order to use this, you need to install a few packages first:

You might need to install some other dependencies depending on your operating system. Try using the packages,

sudo apt-get install python3-dev python3-numpy python3-scipy g++ gfortran liblapack-dev libopenblas-dev myspell-pt myspell-fa myspell-en-au myspell-en-gb myspell-en-us myspell-en-za myspell-fr myspell-es hunspell-vi myspell-he

If you’re on Ubuntu, you might also be able to install an Indonesian dictionary:

sudo apt-get install aspell-id

Virtualenv users, please note that you’ll have to use the –system-site-packages option if you install scipy and numpy via apt-get. You can also use pip3 within your virtualenv.

Python modules

If you need the Python package installer,

sudo easy_install3 pip

Then, install this module,

pip3 install --user revscoring

You’ll need to download NLTK data in order to make use of language features.

python3 -m nltk.downloader wordnet omw stopwords

Authors

Aaron Halfaker:
  • http://halfaker.info

Helder:
  • https://github.com/he7d3r

Adam Roses Wight:
  • https://mediawiki.org/wiki/User:Adamw

Project details


Release history Release notifications | RSS feed

This version

0.4.7

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

revscoring-0.4.7.zip (88.4 kB view details)

Uploaded Source

revscoring-0.4.7.tar.gz (54.2 kB view details)

Uploaded Source

File details

Details for the file revscoring-0.4.7.zip.

File metadata

  • Download URL: revscoring-0.4.7.zip
  • Upload date:
  • Size: 88.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for revscoring-0.4.7.zip
Algorithm Hash digest
SHA256 c28517f925782f45c5a238c1313343f8cbadc328766c2fd7fff1649235c8a8f8
MD5 3f1176ec49916a573e0850b330c646d0
BLAKE2b-256 4e7a76b7a6e1403b15bbaeec5ad464be68157dfd5383a5d1853d75bea4334ee1

See more details on using hashes here.

File details

Details for the file revscoring-0.4.7.tar.gz.

File metadata

  • Download URL: revscoring-0.4.7.tar.gz
  • Upload date:
  • Size: 54.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for revscoring-0.4.7.tar.gz
Algorithm Hash digest
SHA256 b99ec04b7208e57c358be70d15b887114f4eaee15bbe0319661d663bc4604198
MD5 9c154401440868df23c196050227d956
BLAKE2b-256 2951f940a1e0578806bb553cd224d550ff89ef70db1df7902ea6a0ab380aad7b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page