Skip to main content

A set of utilities for generating quality scores for MediaWiki revisions

Project description

Revision Scoring

A generic, machine learning-based revision scoring system designed to be used to automatically differentiate damage from productive contributory behavior on Wikipedia.

Examples

Scoring models:

>>> from mw.api import Session
>>>
>>> from revscoring.extractors import APIExtractor
>>> from revscoring.languages import english
>>> from revscoring.scorers import MLScorerModel
>>>
>>> api_session = Session("https://en.wikipedia.org/w/api.php")
Sending requests with default User-Agent.  Set 'user_agent' on api.Session to quiet this message.
>>> extractor = APIExtractor(api_session, english)
>>>
>>> filename = "models/reverts.halfak_mix.trained.model"
>>> model = MLScorerModel.load(open(filename, 'rb'))
>>>
>>> rev_ids = [105, 642215410, 638307884]
>>> feature_values = [extractor.extract(id, model.features) for id in rev_ids]

>>> scores = model.score(feature_values, probabilities=True)
>>> for rev_id, score in zip(rev_ids, scores):
...     print("{0}: {1}".format(rev_id, score))
...
105: {'probabilities': array([ 0.96441465,  0.03558535]), 'prediction': False}
642215410: {'probabilities': array([ 0.75884553,  0.24115447]), 'prediction': True}
638307884: {'probabilities': array([ 0.98441738,  0.01558262]), 'prediction': False}

Feature extraction:

>>> from mw.api import Session
>>>
>>> from revscoring.extractors import APIExtractor
>>> from revscoring.features import diff, parent_revision, revision, user
>>>
>>> api_extractor = APIExtractor(Session("https://en.wikipedia.org/w/api.php"))
Sending requests with default User-Agent.  Set 'user_agent' on api.Session to quiet this message.
>>>
>>> features = [revision.day_of_week,
...             revision.hour_of_day,
...             revision.has_custom_comment,
...             parent_revision.bytes_changed,
...             diff.chars_added,
...             user.age,
...             user.is_anon,
...             user.is_bot]
>>>
>>> values = api_extractor.extract(
...     624577024,
...     features
... )
>>> for feature, value in zip(features, values):
...     print("{0}: {1}".format(feature, value))
...
<revision.day_of_week>: 6
<revision.hour_of_day>: 19
<revision.has_custom_comment>: True
<(revision.bytes - parent_revision.bytes_changed)>: 3
<diff.chars_added>: 8
<user.age>: 71821407
<user.is_anon>: False
<user.is_bot>: False

Installation

In order to use this, you need to install a few packages first:

pip install revscoring

You’ll need to download NLTK data in order to make use of language features.

>>> python
>>> import nltk
>>> nltk.download()
>>> Downloader> d
>>> Identifier> wordnet
>>> Downloader> d
>>> Identifier> omw
>>> Downloader> d
>>> Identifier> stopwords
>>> Downloader> q
>>> exit()

You might need to install some other dependencies depending on your operating system. These are for scipy and numpy.

Linux Mint 17.1:

  1. sudo apt-get install g++ gfortran liblapack-dev python3-dev myspell-pt myspell-fa myspell-en-au myspell-en-gb myspell-en-us myspell-en-za myspell-fr aspell-id myspell-es hunspell-vi

Ubuntu 14.04:

  1. sudo apt-get install g++ gfortran liblapack-dev libopenblas-dev python3-dev myspell-pt myspell-fa myspell-en-au myspell-en-gb myspell-en-us myspell-en-za myspell-fr aspell-id myspell-es hunspell-vi

Authors

Aaron Halfaker:
  • http://halfaker.info

Helder:
  • https://github.com/he7d3r

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

revscoring-0.4.5.zip (85.4 kB view details)

Uploaded Source

revscoring-0.4.5.tar.gz (52.7 kB view details)

Uploaded Source

File details

Details for the file revscoring-0.4.5.zip.

File metadata

  • Download URL: revscoring-0.4.5.zip
  • Upload date:
  • Size: 85.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for revscoring-0.4.5.zip
Algorithm Hash digest
SHA256 3fe1a7d4789eb258e5f76c4a3c67d3db816dad7e6eea41595210f3c1d7bbeb11
MD5 619b8249a2b6a8390acacfa8b594f67c
BLAKE2b-256 1a08a2f536f4ddae8e1d3dfd4bb2a0770354372f2133d340c5b73fda11906640

See more details on using hashes here.

File details

Details for the file revscoring-0.4.5.tar.gz.

File metadata

  • Download URL: revscoring-0.4.5.tar.gz
  • Upload date:
  • Size: 52.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for revscoring-0.4.5.tar.gz
Algorithm Hash digest
SHA256 21d132b95b4ee4166bec83bdb5836627ae181d6bcf388ddc873969a682439c8e
MD5 4fc062ee73dbde4da96f07416a7067eb
BLAKE2b-256 9d8456566159c4a0d1ba8aa6a7a155785c8a769b6870f2df70a872dc4cf0f121

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page