A set of utilities for generating quality scores for MediaWiki revisions
Project description
Revision Scoring
A generic, machine learning-based revision scoring system designed to be used to automatically differentiate damage from productive contributory behavior on Wikipedia.
Examples
Scoring models:
>>> from mw.api import Session >>> >>> from revscoring.extractors import APIExtractor >>> from revscoring.languages import english >>> from revscoring.scorers import MLScorerModel >>> >>> api_session = Session("https://en.wikipedia.org/w/api.php") Sending requests with default User-Agent. Set 'user_agent' on api.Session to quiet this message. >>> extractor = APIExtractor(api_session, english) >>> >>> filename = "models/reverts.halfak_mix.trained.model" >>> model = MLScorerModel.load(open(filename, 'rb')) >>> >>> rev_ids = [105, 642215410, 638307884] >>> feature_values = [extractor.extract(id, model.features) for id in rev_ids] >>> scores = model.score(feature_values, probabilities=True) >>> for rev_id, score in zip(rev_ids, scores): ... print("{0}: {1}".format(rev_id, score)) ... 105: {'probabilities': array([ 0.96441465, 0.03558535]), 'prediction': False} 642215410: {'probabilities': array([ 0.75884553, 0.24115447]), 'prediction': True} 638307884: {'probabilities': array([ 0.98441738, 0.01558262]), 'prediction': False}
Feature extraction:
>>> from mw.api import Session >>> >>> from revscoring.extractors import APIExtractor >>> from revscoring.features import diff, parent_revision, revision, user >>> >>> api_extractor = APIExtractor(Session("https://en.wikipedia.org/w/api.php")) Sending requests with default User-Agent. Set 'user_agent' on api.Session to quiet this message. >>> >>> features = [revision.day_of_week, ... revision.hour_of_day, ... revision.has_custom_comment, ... parent_revision.bytes_changed, ... diff.chars_added, ... user.age, ... user.is_anon, ... user.is_bot] >>> >>> values = api_extractor.extract( ... 624577024, ... features ... ) >>> for feature, value in zip(features, values): ... print("{0}: {1}".format(feature, value)) ... <revision.day_of_week>: 6 <revision.hour_of_day>: 19 <revision.has_custom_comment>: True <(revision.bytes - parent_revision.bytes_changed)>: 3 <diff.chars_added>: 8 <user.age>: 71821407 <user.is_anon>: False <user.is_bot>: False
Installation
Packages
In order to use this, you need to install a few packages first:
You might need to install some other dependencies depending on your operating system. Try using the packages,
sudo apt-get install python3-dev python3-numpy python3-scipy g++ gfortran liblapack-dev libopenblas-dev myspell-pt myspell-fa myspell-en-au myspell-en-gb myspell-en-us myspell-en-za myspell-fr myspell-es hunspell-vi myspell-he
If you’re on Ubuntu, you might also be able to install an Indonesian dictionary:
sudo apt-get install aspell-id
Virtualenv users, please note that you’ll have to use the –system-site-packages option if you install scipy and numpy via apt-get. You can also use pip3 within your virtualenv.
Python modules
If you need the Python package installer,
sudo easy_install3 pip
Then, install this module,
pip3 install --user revscoring
You’ll need to download NLTK data in order to make use of language features.
python3 -m nltk.downloader stopwords
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
File details
Details for the file revscoring-0.6.3.zip
.
File metadata
- Download URL: revscoring-0.6.3.zip
- Upload date:
- Size: 108.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb41773134641e67fe81f00fadfadd1b94e9304fa6d3b4feefe00db3f944e7da |
|
MD5 | d775d9b888b55c259322cc2c72dc4676 |
|
BLAKE2b-256 | 0640e76a6062b339175ea2870118788a0cc6a1f3df0a1b1d31eb835c80a1e44f |
File details
Details for the file revscoring-0.6.3.tar.gz
.
File metadata
- Download URL: revscoring-0.6.3.tar.gz
- Upload date:
- Size: 68.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5409f55bef7162d0565b1def6382adfcfedeabc13521719d9b6997067f27ae20 |
|
MD5 | 3c1b160117256e6dd719c71376134624 |
|
BLAKE2b-256 | 02eca1e862814402f06f0b594580a4a2cb5334e95736924d6e84f41746642d89 |