A set of utilities for generating quality scores for MediaWiki revisions
Project description
|travis|_ |codecov|_
Revision Scoring
================
A generic, machine learning-based revision scoring system designed to be used
to automatically differentiate damage from productive contributory behavior on
Wikipedia.
Example
========
Using a scorer_model to score a revision::
>>> import mwapi
>>> from revscoring import ScorerModel
>>> from revscoring.extractors.api.extractor import Extractor
>>>
>>> with open("models/enwiki.damaging.linear_svc.model") as f:
... scorer_model = ScorerModel.load(f)
...
>>> extractor = Extractor(mwapi.Session(host="https://en.wikipedia.org",
... user_agent="revscoring demo"))
>>>
>>> feature_values = list(extractor.extract(123456789, scorer_model.features))
>>>
>>> print(scorer_model.score(feature_values))
{'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}
Installation
============
The easiest way to install `revscoring` is via the Python package installer
(pip).
``pip install revscoring``
You may find that some of `revscorings` dependencies fail to compile (namely
`scipy`, `numpy` and `sklearn`). In that case, you'll need to install some
dependencies in your operating system.
Ubuntu & Debian:
Run ``sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev``
Windows:
'TODO'
MacOS:
Using Homebrew and pip, installing `revscoring` and `enchant` can be accomplished
as follows::
brew install aspell --with-all-languages
brew install enchant
pip install --no-binary pyenchant revscoring
Languages can be added to `aspell`::
cd /tmp
wget http://ftp.gnu.org/gnu/aspell/dict/pt/aspell-pt-0.50-2.tar.bz2
bzip2 -dc aspell-pt-0.50-2.tar.bz2 | tar xvf -
cd aspell-pt-0.50-2
./configure
make
sudo make install
Caveats:
* The differences between the `aspell` and `myspell` dictionaries can cause
some of the tests to fail
Finally, in order to make use of language features, you'll need to download
some NLTK data. The following command will get the necessary corpus.
``python -m nltk.downloader stopwords``
You'll also need to install `enchant <https://en.wikipedia.org/wiki/Enchant_(software)>`_ compatible
dictionaries of the languages you'd like to use. We recommend the following:
* ``languages.arabic``: aspell-ar
* ``languages.bengali``: aspell-bn
* ``languages.czech``: myspell-cs
* ``languages.dutch``: myspell-nl
* ``languages.english``: myspell-en-us myspell-en-gb myspell-en-au
* ``languages.estonian``: myspell-et
* ``languages.finnish``: voikko-fi
* ``languages.french``: myspell-fr
* ``languages.german``: myspell-de-at myspell-de-ch myspell-de-de
* ``languages.greek``: aspell-el
* ``languages.hebrew``: myspell-he
* ``languages.hungarian``: myspell-hu
* ``languages.indonesian``: aspell-id
* ``languages.italian``: myspell-it
* ``languages.norwegian``: myspell-nb
* ``languages.persian``: myspell-fa
* ``languages.polish``: aspell-pl
* ``languages.portuguese``: myspell-pt
* ``languages.spanish``: myspell-es
* ``languages.swedish``: aspell-sv
* ``languages.tamil``: aspell-ta
* ``languages.russian``: myspell-ru
* ``languages.ukrainian``: myspell-uk
* ``languages.vietnamese``: hunspell-vi
Authors
=======
Aaron Halfaker:
* `http://halfaker.info`
Helder:
* `https://github.com/he7d3r`
Adam Roses Wight:
* `https://mediawiki.org/wiki/User:Adamw`
Amir Sarabadani:
* `https://github.com/Ladsgroup`
.. |travis| image:: https://api.travis-ci.org/wiki-ai/revscoring.png
.. _travis: https://travis-ci.org/wiki-ai/revscoring
.. |codecov| image:: https://codecov.io/github/wiki-ai/revscoring/revscoring.svg
.. _codecov: https://codecov.io/github/wiki-ai/revscoring
Revision Scoring
================
A generic, machine learning-based revision scoring system designed to be used
to automatically differentiate damage from productive contributory behavior on
Wikipedia.
Example
========
Using a scorer_model to score a revision::
>>> import mwapi
>>> from revscoring import ScorerModel
>>> from revscoring.extractors.api.extractor import Extractor
>>>
>>> with open("models/enwiki.damaging.linear_svc.model") as f:
... scorer_model = ScorerModel.load(f)
...
>>> extractor = Extractor(mwapi.Session(host="https://en.wikipedia.org",
... user_agent="revscoring demo"))
>>>
>>> feature_values = list(extractor.extract(123456789, scorer_model.features))
>>>
>>> print(scorer_model.score(feature_values))
{'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}
Installation
============
The easiest way to install `revscoring` is via the Python package installer
(pip).
``pip install revscoring``
You may find that some of `revscorings` dependencies fail to compile (namely
`scipy`, `numpy` and `sklearn`). In that case, you'll need to install some
dependencies in your operating system.
Ubuntu & Debian:
Run ``sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev``
Windows:
'TODO'
MacOS:
Using Homebrew and pip, installing `revscoring` and `enchant` can be accomplished
as follows::
brew install aspell --with-all-languages
brew install enchant
pip install --no-binary pyenchant revscoring
Languages can be added to `aspell`::
cd /tmp
wget http://ftp.gnu.org/gnu/aspell/dict/pt/aspell-pt-0.50-2.tar.bz2
bzip2 -dc aspell-pt-0.50-2.tar.bz2 | tar xvf -
cd aspell-pt-0.50-2
./configure
make
sudo make install
Caveats:
* The differences between the `aspell` and `myspell` dictionaries can cause
some of the tests to fail
Finally, in order to make use of language features, you'll need to download
some NLTK data. The following command will get the necessary corpus.
``python -m nltk.downloader stopwords``
You'll also need to install `enchant <https://en.wikipedia.org/wiki/Enchant_(software)>`_ compatible
dictionaries of the languages you'd like to use. We recommend the following:
* ``languages.arabic``: aspell-ar
* ``languages.bengali``: aspell-bn
* ``languages.czech``: myspell-cs
* ``languages.dutch``: myspell-nl
* ``languages.english``: myspell-en-us myspell-en-gb myspell-en-au
* ``languages.estonian``: myspell-et
* ``languages.finnish``: voikko-fi
* ``languages.french``: myspell-fr
* ``languages.german``: myspell-de-at myspell-de-ch myspell-de-de
* ``languages.greek``: aspell-el
* ``languages.hebrew``: myspell-he
* ``languages.hungarian``: myspell-hu
* ``languages.indonesian``: aspell-id
* ``languages.italian``: myspell-it
* ``languages.norwegian``: myspell-nb
* ``languages.persian``: myspell-fa
* ``languages.polish``: aspell-pl
* ``languages.portuguese``: myspell-pt
* ``languages.spanish``: myspell-es
* ``languages.swedish``: aspell-sv
* ``languages.tamil``: aspell-ta
* ``languages.russian``: myspell-ru
* ``languages.ukrainian``: myspell-uk
* ``languages.vietnamese``: hunspell-vi
Authors
=======
Aaron Halfaker:
* `http://halfaker.info`
Helder:
* `https://github.com/he7d3r`
Adam Roses Wight:
* `https://mediawiki.org/wiki/User:Adamw`
Amir Sarabadani:
* `https://github.com/Ladsgroup`
.. |travis| image:: https://api.travis-ci.org/wiki-ai/revscoring.png
.. _travis: https://travis-ci.org/wiki-ai/revscoring
.. |codecov| image:: https://codecov.io/github/wiki-ai/revscoring/revscoring.svg
.. _codecov: https://codecov.io/github/wiki-ai/revscoring
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
revscoring-1.3.13.tar.gz
(180.5 kB
view details)
Built Distribution
File details
Details for the file revscoring-1.3.13.tar.gz
.
File metadata
- Download URL: revscoring-1.3.13.tar.gz
- Upload date:
- Size: 180.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 418ffa47e6c4cd7bceb9ede27228c7cb482f00bd7fc2ceafcfbc4680cd8ded3f |
|
MD5 | c81b5ffbfbedcdb0ed911458566f9324 |
|
BLAKE2b-256 | 14dc523a581d40b8085e8ebab71b4fa0fdf7eac4c0c1fc4e2bc1127399f28b7f |
File details
Details for the file revscoring-1.3.13-py2.py3-none-any.whl
.
File metadata
- Download URL: revscoring-1.3.13-py2.py3-none-any.whl
- Upload date:
- Size: 410.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3f57c824aa51962367399193dd64bb9a68a3dac7b060bdbfd50177a16d9ec58 |
|
MD5 | e929a4383f3b326887a1ccb265f2f862 |
|
BLAKE2b-256 | 571f6cd320c2e446ddc63ede383bcc351443f829f4f1a417cf9f2fe5dd863e02 |