A library for automatic detection of topics of new drafts on Wikipedia based on WikiProjects.
Project description
# Draft topic
Predicting topics to new drafts based on Wikiprojects on English Wikipedia.
## Setting up
Make sure to have a working python3 environment. Install requirements using:
` pip install -r requirements `
Install the library using:
` python setup.py install `
## Generating machine-readable WikiProjects data
Use the following utility from root directory to generate machine-readable WikiProjects data:
` ./utility fetch_wikiprojects --output <output_file_name.json> `
## Generating mid-level category to WikiProjects mapping
Use the following utility from root directory to generate a mapping of high-level topic categories to list of WikiProjects contained in them:
` ./utility trim_wikiprojects --wikiprojects wp --output outmid `
## Labeling a list of page-ids with the wikiprojects and mid-level categories each page belongs to
Use the following utility from root directory to label a list of page-ids with the wikiprojects and the mid-level categories the page belongs to.
` ./utility fetch_page_wikiprojects --api-host=https://en.wikipedia.org/ --input=wikiproject_page_ids.json --output=enwiki.labeled_wikiprojects.json --mid_level_wp=outmid.json --verbose `
In above, the input to the script should be a json containing a list of observations, each observation having a page_id: <page-id> mapping. Additionally also pass the mid-level wikiprojects json for the script to generate wikiprojects to mid-level categories mapping. The script augments the given list with the mentioned fields, writing them to a new file specified by “output”
## Generating predictions for a set of page-ids on Wikipedia
For generating topic predictions for a set of revision-ids, download the relevant model and use revscoring’s [score](https://github.com/wikimedia/revscoring/blob/master/revscoring/utilities/score.py) API to generate predictions. Note that the revision-ids need to be in a file with a format specified by the API. Use the revision ID of the most recent revision for a page to get a good prediction.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file drafttopic-0.2.0.tar.gz
.
File metadata
- Download URL: drafttopic-0.2.0.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.0.0 requests-toolbelt/0.9.1 tqdm/4.15.0 CPython/3.5.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d0787ec7f6d406850015139a72884127e3c46a0007329646b7fbdc867dad713 |
|
MD5 | 239bd0193091c3f737322ec985c3993d |
|
BLAKE2b-256 | 1f0787daa75cdae33b7991689382ce1e640be42cae565f21aa7f4b0f99fc5b18 |
File details
Details for the file drafttopic-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: drafttopic-0.2.0-py3-none-any.whl
- Upload date:
- Size: 22.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.0.0 requests-toolbelt/0.9.1 tqdm/4.15.0 CPython/3.5.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f5edbee6e3266d95545cb6757e34b0b88e1b5921c02e49ca5b74af84ca85b2e |
|
MD5 | 0850a88a9418cb8fe8966c2c1d4cdfa2 |
|
BLAKE2b-256 | f0befbc0033a1f9ee79e6e1840cc1405def19746624c3989a965a1355f5da92a |