A library for performing automatic detection of assessment classes of Wikipedia articles.
Project description
Wikipedia article quality classification
This library provides a set of utilities for performing automatic detection of assessment classes of Wikipedia articles. For more information, see the full documentation at https://articlequality.readthedocs.io .
Compatible with Python 3.x only. Sorry.
- Install:
pip install articlequality
- Models: https://github.com/wikimedia/articlequality/tree/master/models
- Documentation: https://articlequality.readthedocs.io
Basic usage
>>> import articlequality
>>> from revscoring import Model
>>>
>>> scorer_model = Model.load(open("models/enwiki.nettrom_wp10.gradient_boosting.model", "rb"))
>>>
>>> text = "I am the text of a page. I have a <ref>word</ref>"
>>> articlequality.score(scorer_model, text)
{'prediction': 'stub',
'probability': {'stub': 0.27156163795807853,
'b': 0.14707452309674252,
'fa': 0.16844898943510833,
'c': 0.057668704007171959,
'ga': 0.21617801281707663,
'start': 0.13906813268582238}}
Install
Requirements
- Python 3.5, 3.6 or 3.7
- All the system requirements of revscoring
Installation steps
- clone this repository
- install the package itself and its dependencies
python setup.py install
- You can verify that your installation worked by running
make enwiki_models
to build the English Wikipedia article quality model ormake wikidatawiki_models
to build the item quality model for Wikidata
Retraining the models
To retrain a model, run make -B MODEL
e.g. make -B wikidatawiki_models
. This will redownload the labels, re-extract the features from the revisions, and then retrain and rescore the model.
To skip re-downloading the training labels and re-extracting the features, it is enough touch
the files in the datasets/
directory and run the make
command without the -B
flag.
Running tests
Example:
pytest -vv tests/feature_lists/test_wikidatawiki.py
Authors
- Aaron Halfaker -- https://github.com/halfak
- Morten Warncke-Wang -- https://github.com/nettrom
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file articlequality-0.4.3.tar.gz
.
File metadata
- Download URL: articlequality-0.4.3.tar.gz
- Upload date:
- Size: 37.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11afc8ff855f9041128837b412986d24509b3ec4a43237c6b7b6555c9b638dab |
|
MD5 | 3cb0de532445e2c4d296667c5d2fc97e |
|
BLAKE2b-256 | 875912194a88759599b069b910152e59e3fed3da46a97f1c2ccc32461ecad67e |
File details
Details for the file articlequality-0.4.3-py2.py3-none-any.whl
.
File metadata
- Download URL: articlequality-0.4.3-py2.py3-none-any.whl
- Upload date:
- Size: 56.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 743bad3e7e916c435fc74c6a0c05b8a454d052034715f45978fd7ef3331412ea |
|
MD5 | 60be143608808681f002a6f8a8df1a73 |
|
BLAKE2b-256 | 54f4e42d1960c707c0c2d297c9b9d95d050893356d17c4604cdbd7f92b1d6c2a |