Scraping grapheme-to-phoneme data from Wiktionary.
Project description
WikiPron
WikiPron is a command line toolkit for scraping grapheme-to-phoneme (G2P) data from Wiktionary.
Installation
WikiPron requires Python 3.6+. It is available through pip:
pip install wikipron
Usage
After installation, the terminal command wikipron
will be available.
As a basic example, the following command scrapes G2P data for French
(with the ISO language code fr
):
wikipron fr
By default, the results appear on the terminal, where each line has the orthography of a word, followed by a tab and then the word's pronunciation in IPA.
For example commands using advanced options,
the languages/wikipron/scrape
script shows
how a multilingual G2P dataset can be created.
For a full list of command-line options, please run wikipron -h
.
The underlying module can also be used from Python. A standard workflow looks like:
import wikipron
config = wikipron.Config(key="fr") # French, with default options.
for word, pron in wikipron.scrape(config):
...
Development and Contribution
For questions, bug reports, and feature requests, please file an issue.
If you would like to contribute to the wikipron
codebase,
please see CONTRIBUTING.md
.
License
Apache 2.0. Please see LICENSE.txt
for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file wikipron-0.1.0.tar.gz
.
File metadata
- Download URL: wikipron-0.1.0.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e89e43b3fee35ebcf4c921e49a9b0c3fa26d0c07170d0982bc6c1ee14496bf25 |
|
MD5 | 5bfdd8256007c0cb263e41498755fd13 |
|
BLAKE2b-256 | 1e835fb6cca757151404e67c3a2075df50bb235689568dbc39ede32a56338b42 |