Skip to main content

A library for performing automatic detection of assessment classes of Wikipedia articles.

Project description

Wikipedia article quality classification

This library provides a set of utilities for performing automatic detection of assessment classes of Wikipedia articles. For more information, see the full documentation at https://articlequality.readthedocs.io .

Compatible with Python 3.x only. Sorry.

Basic usage

>>> import articlequality
>>> from revscoring import Model
>>>
>>> scorer_model = Model.load(open("models/enwiki.nettrom_wp10.gradient_boosting.model", "rb"))
>>>
>>> text = "I am the text of a page.  I have a <ref>word</ref>"
>>> articlequality.score(scorer_model, text)
{'prediction': 'stub',
 'probability': {'stub': 0.27156163795807853,
                 'b': 0.14707452309674252,
                 'fa': 0.16844898943510833,
                 'c': 0.057668704007171959,
                 'ga': 0.21617801281707663,
                 'start': 0.13906813268582238}}

Install

Requirements

  • Python 3.5, 3.6 or 3.7
  • All the system requirements of revscoring

Installation steps

  1. clone this repository
  2. install the package itself and its dependencies python setup.py install
  3. You can verify that your installation worked by running make enwiki_models to build the English Wikipedia article quality model or make wikidatawiki_models to build the item quality model for Wikidata

Retraining the models

To retrain a model, run make -B MODEL e.g. make -B wikidatawiki_models. This will redownload the labels, re-extract the features from the revisions, and then retrain and rescore the model.

To skip re-downloading the training labels and re-extracting the features, it is enough touch the files in the datasets/ directory and run the make command without the -B flag.

Running tests

Example:

pytest -vv tests/feature_lists/test_wikidatawiki.py

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

articlequality-0.4.3.tar.gz (37.1 kB view details)

Uploaded Source

Built Distribution

articlequality-0.4.3-py2.py3-none-any.whl (56.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file articlequality-0.4.3.tar.gz.

File metadata

  • Download URL: articlequality-0.4.3.tar.gz
  • Upload date:
  • Size: 37.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.3

File hashes

Hashes for articlequality-0.4.3.tar.gz
Algorithm Hash digest
SHA256 11afc8ff855f9041128837b412986d24509b3ec4a43237c6b7b6555c9b638dab
MD5 3cb0de532445e2c4d296667c5d2fc97e
BLAKE2b-256 875912194a88759599b069b910152e59e3fed3da46a97f1c2ccc32461ecad67e

See more details on using hashes here.

File details

Details for the file articlequality-0.4.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for articlequality-0.4.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 743bad3e7e916c435fc74c6a0c05b8a454d052034715f45978fd7ef3331412ea
MD5 60be143608808681f002a6f8a8df1a73
BLAKE2b-256 54f4e42d1960c707c0c2d297c9b9d95d050893356d17c4604cdbd7f92b1d6c2a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page