A set of utilities for generating quality scores for MediaWiki revisions
Project description
[![Build Status](https://travis-ci.org/wikimedia/revscoring.svg?branch=master)](https://travis-ci.org/wikimedia/revscoring)
[![Test coverage](https://codecov.io/gh/wikimedia/revscoring/branch/master/graph/badge.svg)](https://codecov.io/gh/wikimedia/revscoring)
# Revision Scoring
A generic, machine learning-based revision scoring system designed to be used
to automatically differentiate damage from productive contributory behavior on
Wikipedia.
## Example
Using a scorer_model to score a revision::
```
import mwapi
from revscoring import Model
from revscoring.extractors.api.extractor import Extractor
with open("models/enwiki.damaging.linear_svc.model") as f:
scorer_model = Model.load(f)
extractor = Extractor(mwapi.Session(host="https://en.wikipedia.org",
user_agent="revscoring demo"))
feature_values = list(extractor.extract(123456789, scorer_model.features))
print(scorer_model.score(feature_values))
{'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}
```
# Installation
The easiest way to install is via the Python package installer
(pip).
``pip install revscoring``
You may find that some of the dependencies fail to compile (namely
`scipy`, `numpy` and `sklearn`). In that case, you'll need to install some
dependencies in your operating system.
### Ubuntu & Debian:
* Run ``sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev``
* Run ``apt-get install aspell-ar aspell-bn aspell-is myspell-cs myspell-nl myspell-en-us myspell-en-gb myspell-en-au myspell-et voikko-fi myspell-fr myspell-de-at myspell-de-ch myspell-de-de myspell-he myspell-hr myspell-hu aspell-id myspell-it myspell-nb myspell-fa aspell-pl myspell-pt myspell-es hunspell-sr aspell-sv aspell-ta myspell-ru myspell-uk hunspell-vi aspell-el myspell-lv aspell-ro myspell-ca hunspell-gl``
### Windows:
<i>TODO</i>
### MacOS:
Using Homebrew and pip, installing `revscoring` and `enchant` can be accomplished
as follows::
* brew install aspell --with-all-languages
* brew install enchant
* pip install --no-binary pyenchant revscoring
#### Adding languages in aspell (MacOS only)
```
cd /tmp
wget http://ftp.gnu.org/gnu/aspell/dict/pt/aspell-pt-0.50-2.tar.bz2
bzip2 -dc aspell-pt-0.50-2.tar.bz2 | tar xvf -
cd aspell-pt-0.50-2
./configure
make
sudo make install
```
Caveats: <br>
<b><u> The differences between the `aspell` and `myspell` dictionaries can cause </b>
<b> <u>some of the tests to fail </b>
Finally, in order to make use of language features, you'll need to download
some NLTK data. The following command will get the necessary corpora.
``python -m nltk.downloader omw sentiwordnet stopwords wordnet``
You'll also need to install `enchant <https://en.wikipedia.org/wiki/Enchant_(software)>`_ compatible
dictionaries of the languages you'd like to use. We recommend the following:
* languages.arabic: aspell-ar
* languages.bengali: aspell-bn
* languages.bosnian: hunspell-bs
* languages.catalan: myspell-ca
* languages.czech: myspell-cs
* languages.croatian: myspell-hr
* languages.dutch: myspell-nl
* languages.english: myspell-en-us myspell-en-gb myspell-en-au
* languages.estonian: myspell-et
* languages.finnish: voikko-fi
* languages.french: myspell-fr
* languages.galician: hunspell-gl
* languages.german: myspell-de-at myspell-de-ch myspell-de-de
* languages.greek: aspell-el
* languages.hebrew: myspell-he
* languages.hungarian: myspell-hu
* languages.icelandic: aspell-is
* languages.indonesian: aspell-id
* languages.italian: myspell-it
* languages.latvian: myspell-lv
* languages.norwegian: myspell-nb
* languages.persian: myspell-fa
* languages.polish: aspell-pl
* languages.portuguese: myspell-pt
* languages.serbian: hunspell-sr
* languages.spanish: myspell-es
* languages.swedish: aspell-sv
* languages.tamil: aspell-ta
* languages.russian: myspell-ru
* languages.ukrainian: aspell-uk
* languages.vietnamese: hunspell-vi
# Authors
* [Aaron Halfaker](http://halfaker.info)
* [Helder](https://github.com/he7d3r)
* [Adam Roses Wight](https://mediawiki.org/wiki/User:Adamw)
* [Amir Sarabadani](https://github.com/Ladsgroup)
[![Test coverage](https://codecov.io/gh/wikimedia/revscoring/branch/master/graph/badge.svg)](https://codecov.io/gh/wikimedia/revscoring)
# Revision Scoring
A generic, machine learning-based revision scoring system designed to be used
to automatically differentiate damage from productive contributory behavior on
Wikipedia.
## Example
Using a scorer_model to score a revision::
```
import mwapi
from revscoring import Model
from revscoring.extractors.api.extractor import Extractor
with open("models/enwiki.damaging.linear_svc.model") as f:
scorer_model = Model.load(f)
extractor = Extractor(mwapi.Session(host="https://en.wikipedia.org",
user_agent="revscoring demo"))
feature_values = list(extractor.extract(123456789, scorer_model.features))
print(scorer_model.score(feature_values))
{'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}
```
# Installation
The easiest way to install is via the Python package installer
(pip).
``pip install revscoring``
You may find that some of the dependencies fail to compile (namely
`scipy`, `numpy` and `sklearn`). In that case, you'll need to install some
dependencies in your operating system.
### Ubuntu & Debian:
* Run ``sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev``
* Run ``apt-get install aspell-ar aspell-bn aspell-is myspell-cs myspell-nl myspell-en-us myspell-en-gb myspell-en-au myspell-et voikko-fi myspell-fr myspell-de-at myspell-de-ch myspell-de-de myspell-he myspell-hr myspell-hu aspell-id myspell-it myspell-nb myspell-fa aspell-pl myspell-pt myspell-es hunspell-sr aspell-sv aspell-ta myspell-ru myspell-uk hunspell-vi aspell-el myspell-lv aspell-ro myspell-ca hunspell-gl``
### Windows:
<i>TODO</i>
### MacOS:
Using Homebrew and pip, installing `revscoring` and `enchant` can be accomplished
as follows::
* brew install aspell --with-all-languages
* brew install enchant
* pip install --no-binary pyenchant revscoring
#### Adding languages in aspell (MacOS only)
```
cd /tmp
wget http://ftp.gnu.org/gnu/aspell/dict/pt/aspell-pt-0.50-2.tar.bz2
bzip2 -dc aspell-pt-0.50-2.tar.bz2 | tar xvf -
cd aspell-pt-0.50-2
./configure
make
sudo make install
```
Caveats: <br>
<b><u> The differences between the `aspell` and `myspell` dictionaries can cause </b>
<b> <u>some of the tests to fail </b>
Finally, in order to make use of language features, you'll need to download
some NLTK data. The following command will get the necessary corpora.
``python -m nltk.downloader omw sentiwordnet stopwords wordnet``
You'll also need to install `enchant <https://en.wikipedia.org/wiki/Enchant_(software)>`_ compatible
dictionaries of the languages you'd like to use. We recommend the following:
* languages.arabic: aspell-ar
* languages.bengali: aspell-bn
* languages.bosnian: hunspell-bs
* languages.catalan: myspell-ca
* languages.czech: myspell-cs
* languages.croatian: myspell-hr
* languages.dutch: myspell-nl
* languages.english: myspell-en-us myspell-en-gb myspell-en-au
* languages.estonian: myspell-et
* languages.finnish: voikko-fi
* languages.french: myspell-fr
* languages.galician: hunspell-gl
* languages.german: myspell-de-at myspell-de-ch myspell-de-de
* languages.greek: aspell-el
* languages.hebrew: myspell-he
* languages.hungarian: myspell-hu
* languages.icelandic: aspell-is
* languages.indonesian: aspell-id
* languages.italian: myspell-it
* languages.latvian: myspell-lv
* languages.norwegian: myspell-nb
* languages.persian: myspell-fa
* languages.polish: aspell-pl
* languages.portuguese: myspell-pt
* languages.serbian: hunspell-sr
* languages.spanish: myspell-es
* languages.swedish: aspell-sv
* languages.tamil: aspell-ta
* languages.russian: myspell-ru
* languages.ukrainian: aspell-uk
* languages.vietnamese: hunspell-vi
# Authors
* [Aaron Halfaker](http://halfaker.info)
* [Helder](https://github.com/he7d3r)
* [Adam Roses Wight](https://mediawiki.org/wiki/User:Adamw)
* [Amir Sarabadani](https://github.com/Ladsgroup)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
revscoring-2.3.0.tar.gz
(220.0 kB
view details)
Built Distribution
File details
Details for the file revscoring-2.3.0.tar.gz
.
File metadata
- Download URL: revscoring-2.3.0.tar.gz
- Upload date:
- Size: 220.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1e19c5e0d2a7a9fc67adeaf1c5e56cc8545cb7432d9e8e09b9e4f4628b7ec96 |
|
MD5 | 211508dbcc8ab2f4a8cc34b900c9678f |
|
BLAKE2b-256 | 330fcde343648d6b22e37c7d960f077c131e6c01085282029c4a004f84937309 |
File details
Details for the file revscoring-2.3.0-py2.py3-none-any.whl
.
File metadata
- Download URL: revscoring-2.3.0-py2.py3-none-any.whl
- Upload date:
- Size: 328.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc701d6c2936e35d138c9855a1064538f6f9966fc9ecb4b2202f6acaec2c73f8 |
|
MD5 | 0b6aaf1a0d7093ee0e0570ad3088b00e |
|
BLAKE2b-256 | 96c275b8bf45cc8bc10cc91f064a84cabd611e50f970522f61bd147b622284bf |