Skip to main content

A library for automatic detection of topics of new drafts on Wikipedia based on WikiProjects.

Project description

# Draft topic

Predicting topics to new drafts based on Wikiprojects on English Wikipedia.

## Setting up

Make sure to have a working python3 environment. Install requirements using:

` pip install -r requirements `

Install the library using:

` python setup.py install `

## Generating machine-readable WikiProjects data

Use the following utility from root directory to generate machine-readable WikiProjects data:

` ./utility fetch_wikiprojects --output <output_file_name.json> `

## Generating mid-level category to WikiProjects mapping

Use the following utility from root directory to generate a mapping of high-level topic categories to list of WikiProjects contained in them:

` ./utility trim_wikiprojects --wikiprojects wp --output outmid `

## Labeling a list of page-ids with the wikiprojects and mid-level categories each page belongs to

Use the following utility from root directory to label a list of page-ids with the wikiprojects and the mid-level categories the page belongs to.

` ./utility fetch_page_wikiprojects --api-host=https://en.wikipedia.org/ --input=wikiproject_page_ids.json --output=enwiki.labeled_wikiprojects.json --mid_level_wp=outmid.json --verbose `

In above, the input to the script should be a json containing a list of observations, each observation having a page_id: <page-id> mapping. Additionally also pass the mid-level wikiprojects json for the script to generate wikiprojects to mid-level categories mapping. The script augments the given list with the mentioned fields, writing them to a new file specified by “output”

## Generating predictions for a set of page-ids on Wikipedia

For generating topic predictions for a set of revision-ids, download the relevant model and use revscoring’s [score](https://github.com/wikimedia/revscoring/blob/master/revscoring/utilities/score.py) API to generate predictions. Note that the revision-ids need to be in a file with a format specified by the API. Use the revision ID of the most recent revision for a page to get a good prediction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drafttopic-0.1.1.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

drafttopic-0.1.1-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file drafttopic-0.1.1.tar.gz.

File metadata

  • Download URL: drafttopic-0.1.1.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.0.0 requests-toolbelt/0.9.1 tqdm/4.15.0 CPython/3.5.6

File hashes

Hashes for drafttopic-0.1.1.tar.gz
Algorithm Hash digest
SHA256 66d7f7e7bc4497418d88e3300beba531ee92b7f0ea68da8169381a00bd5758f5
MD5 8b1af70cbf83f2e2dfeb2addd7f4c7d3
BLAKE2b-256 984b735af926dbd8ac436f98fcdda9d12082e5e50d317bd9b2750774ffe97069

See more details on using hashes here.

File details

Details for the file drafttopic-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: drafttopic-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.0.0 requests-toolbelt/0.9.1 tqdm/4.15.0 CPython/3.5.6

File hashes

Hashes for drafttopic-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2e13d1b970822d41be15528c9b3c49e70cba317f925caba6ea8392a88a0ebd16
MD5 f3048f34fe4ed4a2f922597d035d2341
BLAKE2b-256 a34f91c2867d4914aa04d9e7cd7f9c8a828a7c4fbd738447b904da7a0e55f2a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page