Skip to main content

A set of utilities for generating quality scores for MediaWiki revisions

Project description

[![Build Status](https://travis-ci.org/wiki-ai/revscoring.svg?branch=master)](https://travis-ci.org/wiki-ai/revscoring)
[![Test coverage](https://codecov.io/gh/wiki-ai/revscoring/branch/master/graph/badge.svg)](https://codecov.io/gh/wiki-ai/revscoring)
# Revision Scoring

A generic, machine learning-based revision scoring system designed to be used
to automatically differentiate damage from productive contributory behavior on
Wikipedia.

## Example


Using a scorer_model to score a revision::
```
import mwapi
from revscoring import Model
from revscoring.extractors.api.extractor import Extractor

with open("models/enwiki.damaging.linear_svc.model") as f:
scorer_model = Model.load(f)

extractor = Extractor(mwapi.Session(host="https://en.wikipedia.org",
user_agent="revscoring demo"))

feature_values = list(extractor.extract(123456789, scorer_model.features))

print(scorer_model.score(feature_values))
{'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}
```


# Installation

The easiest way to install is via the Python package installer
(pip).

``pip install revscoring``

You may find that some of the dependencies fail to compile (namely
`scipy`, `numpy` and `sklearn`). In that case, you'll need to install some
dependencies in your operating system.

### Ubuntu & Debian:
* Run ``sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev``
* Run ``apt-get install aspell-ar aspell-bn aspell-is myspell-cs myspell-nl myspell-en-us myspell-en-gb myspell-en-au myspell-et voikko-fi myspell-fr myspell-de-at myspell-de-ch myspell-de-de myspell-he myspell-hr myspell-hu aspell-id myspell-it myspell-nb myspell-fa aspell-pl myspell-pt myspell-es hunspell-sr aspell-sv aspell-ta myspell-ru myspell-uk hunspell-vi aspell-el myspell-lv aspell-ro myspell-ca`
### Windows:
<i>TODO</i>
### MacOS:
Using Homebrew and pip, installing `revscoring` and `enchant` can be accomplished
as follows::

* brew install aspell --with-all-languages
* brew install enchant
* pip install --no-binary pyenchant revscoring

#### Adding languages in aspell (MacOS only)
```
cd /tmp
wget http://ftp.gnu.org/gnu/aspell/dict/pt/aspell-pt-0.50-2.tar.bz2
bzip2 -dc aspell-pt-0.50-2.tar.bz2 | tar xvf -
cd aspell-pt-0.50-2
./configure
make
sudo make install
```
Caveats: <br>
<b><u> The differences between the `aspell` and `myspell` dictionaries can cause </b>
<b> <u>some of the tests to fail </b>


Finally, in order to make use of language features, you'll need to download
some NLTK data. The following command will get the necessary corpora.

``python -m nltk.downloader omw sentiwordnet stopwords wordnet``

You'll also need to install `enchant <https://en.wikipedia.org/wiki/Enchant_(software)>`_ compatible
dictionaries of the languages you'd like to use. We recommend the following:

* languages.arabic: aspell-ar
* languages.bengali: aspell-bn
* languages.bosnian: hunspell-bs
* languages.catalan: myspell-ca
* languages.czech: myspell-cs
* languages.croatian: myspell-hr
* languages.dutch: myspell-nl
* languages.english: myspell-en-us myspell-en-gb myspell-en-au
* languages.estonian: myspell-et
* languages.finnish: voikko-fi
* languages.french: myspell-fr
* languages.german: myspell-de-at myspell-de-ch myspell-de-de
* languages.greek: aspell-el
* languages.hebrew: myspell-he
* languages.hungarian: myspell-hu
* languages.icelandic: aspell-is
* languages.indonesian: aspell-id
* languages.italian: myspell-it
* languages.latvian: myspell-lv
* languages.norwegian: myspell-nb
* languages.persian: myspell-fa
* languages.polish: aspell-pl
* languages.portuguese: myspell-pt
* languages.serbian: hunspell-sr
* languages.spanish: myspell-es
* languages.swedish: aspell-sv
* languages.tamil: aspell-ta
* languages.russian: myspell-ru
* languages.ukrainian: aspell-uk
* languages.vietnamese: hunspell-vi

# Authors

* [Aaron Halfaker](http://halfaker.info)


* [Helder](https://github.com/he7d3r)
   

* [Adam Roses Wight](https://mediawiki.org/wiki/User:Adamw)
   

* [Amir Sarabadani](https://github.com/Ladsgroup)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

revscoring-2.2.2.tar.gz (214.1 kB view details)

Uploaded Source

Built Distribution

revscoring-2.2.2-py2.py3-none-any.whl (319.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file revscoring-2.2.2.tar.gz.

File metadata

  • Download URL: revscoring-2.2.2.tar.gz
  • Upload date:
  • Size: 214.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for revscoring-2.2.2.tar.gz
Algorithm Hash digest
SHA256 ed562a9f55dfdddb1be6606c5d2dbb4fe7a4bd355167bdec2d7b8c3f98693699
MD5 08ddf073cdc9350a8036cd6b9ec85f84
BLAKE2b-256 ba048c40f5f2e8d7be7fa8b9bbbff276fd0597940d86372496645444237294f8

See more details on using hashes here.

File details

Details for the file revscoring-2.2.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for revscoring-2.2.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 50d85b599024796ea11822701f65d8544cf93fa158cbacb6e02182651cdec706
MD5 90b6288deea72eaf24969627d4084a98
BLAKE2b-256 3af3bed3af0d0cf1de705a9b4dbfa1be00ca1898ab6d9e3c5e1b29fb5f1d4857

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page