Skip to main content

A set of utilities for generating quality scores for MediaWiki revisions

Project description

Build Status Test coverage GitHub license PyPI version

Revision Scoring

A generic, machine learning-based revision scoring system designed to help automate critical wiki-work — for example, vandalism detection and removal. This library powers ORES.

Example

Using a scorer_model to score a revision::

  import mwapi
  from revscoring import Model
  from revscoring.extractors.api.extractor import Extractor

  with open("models/enwiki.damaging.linear_svc.model") as f:
       scorer_model = Model.load(f)

  extractor = Extractor(mwapi.Session(host="https://en.wikipedia.org",
                                          user_agent="revscoring demo"))

  feature_values = list(extractor.extract(123456789, scorer_model.features))

  print(scorer_model.score(feature_values))
  {'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}

Installation

The easiest way to install is via the Python package installer (pip).

pip install revscoring

You may find that some of the dependencies fail to compile (namely scipy, numpy and sklearn). In that case, you'll need to install some dependencies in your operating system.

Ubuntu & Debian:

  • Run sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev
  • Run sudo apt-get install aspell-ar aspell-bn aspell-el aspell-id aspell-is aspell-pl aspell-ro aspell-sv aspell-ta aspell-uk myspell-cs myspell-de-at myspell-de-ch myspell-de-de myspell-es myspell-et myspell-fa myspell-fr myspell-he myspell-hr myspell-hu myspell-lv myspell-nb myspell-nl myspell-pt-pt myspell-pt-br myspell-ru myspell-hr hunspell-bs hunspell-ca hunspell-en-au hunspell-en-us hunspell-en-gb hunspell-eu hunspell-gl hunspell-it hunspell-hi hunspell-sr hunspell-vi voikko-fi

MacOS:

Using Homebrew and pip, installing revscoring and enchant can be accomplished as follows::

brew install aspell --with-all-languages
brew install enchant
pip install --no-binary pyenchant revscoring

Adding languages in aspell (MacOS only)

cd /tmp
wget http://ftp.gnu.org/gnu/aspell/dict/pt/aspell-pt-0.50-2.tar.bz2
bzip2 -dc aspell-pt-0.50-2.tar.bz2 | tar xvf -
cd aspell-pt-0.50-2
./configure
make
sudo make install

Caveats:
The differences between the aspell and myspell dictionaries can cause some of the tests to fail

Finally, in order to make use of language features, you'll need to download some NLTK data. The following command will get the necessary corpora.

python -m nltk.downloader omw sentiwordnet stopwords wordnet

You'll also need to install enchant-compatible dictionaries of the languages you'd like to use. We recommend the following:

  • languages.arabic: aspell-ar
  • languages.basque: hunspell-eu
  • languages.bengali: aspell-bn
  • languages.bosnian: hunspell-bs
  • languages.catalan: myspell-ca
  • languages.czech: myspell-cs
  • languages.croatian: myspell-hr
  • languages.dutch: myspell-nl
  • languages.english: myspell-en-us myspell-en-gb myspell-en-au
  • languages.estonian: myspell-et
  • languages.finnish: voikko-fi
  • languages.french: myspell-fr
  • languages.galician: hunspell-gl
  • languages.german: myspell-de-at myspell-de-ch myspell-de-de
  • languages.greek: aspell-el
  • languages.hebrew: myspell-he
  • languages.hindi: aspell-hi
  • languages.hungarian: myspell-hu
  • languages.icelandic: aspell-is
  • languages.indonesian: aspell-id
  • languages.italian: myspell-it
  • languages.latvian: myspell-lv
  • languages.norwegian: myspell-nb
  • languages.persian: myspell-fa
  • languages.polish: aspell-pl
  • languages.portuguese: myspell-pt-pt myspell-pt-br
  • languages.serbian: hunspell-sr
  • languages.spanish: myspell-es
  • languages.swedish: aspell-sv
  • languages.tamil: aspell-ta
  • languages.russian: myspell-ru
  • languages.ukrainian: aspell-uk
  • languages.vietnamese: hunspell-vi

Development

To contribute, ensure to install the dependencies:

$ pip install -r requirements.txt

Install necessary NLTK data:

python -m nltk.downloader omw sentiwordnet stopwords wordnet

Running tests

Make sure you install test dependencies:

$ pip install -r test-requirements.txt

Then run:

$ pytest . -vv

Authors

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

revscoring-2.7.2.tar.gz (263.5 kB view details)

Uploaded Source

Built Distribution

revscoring-2.7.2-py2.py3-none-any.whl (375.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file revscoring-2.7.2.tar.gz.

File metadata

  • Download URL: revscoring-2.7.2.tar.gz
  • Upload date:
  • Size: 263.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.15.0 CPython/3.5.6

File hashes

Hashes for revscoring-2.7.2.tar.gz
Algorithm Hash digest
SHA256 1270c31977c1540381a0fe03dffc37bc59a066fff32b2c11481bdf1f30ee0d2f
MD5 b0ce257073942347e52f60d8bdc2c2c5
BLAKE2b-256 fdeaccf80e40582e40d245607996fd8b24f82d4ae6b40ca2cdf289b940babc18

See more details on using hashes here.

File details

Details for the file revscoring-2.7.2-py2.py3-none-any.whl.

File metadata

  • Download URL: revscoring-2.7.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 375.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.15.0 CPython/3.5.1+

File hashes

Hashes for revscoring-2.7.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 4d56abb9cd2a040b129269f15783a91ca367139e43cc12293554d2fc6ab54169
MD5 5ec70b02dc211dd5a2d33c68b864afa8
BLAKE2b-256 9d7aa52cb739f4530661b3a7d3a883688ce87be8ee9165c534c16451a77333ec

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page