Skip to main content

Scraping grapheme-to-phoneme data from Wiktionary.

Project description

WikiPron

PyPI version Supported Python versions CircleCI

WikiPron is a command line toolkit for scraping grapheme-to-phoneme (G2P) data from Wiktionary.

Installation

WikiPron requires Python 3.6+. It is available through pip:

pip install wikipron

Usage

After installation, the terminal command wikipron will be available. As a basic example, the following command scrapes G2P data for French (with the ISO language code fr):

wikipron fr

By default, the results appear on the terminal, where each line has the orthography of a word, followed by a tab and then the word's pronunciation in IPA.

For example commands using advanced options, the languages/wikipron/scrape script shows how a multilingual G2P dataset can be created.

For a full list of command-line options, please run wikipron -h.

The underlying module can also be used from Python. A standard workflow looks like:

import wikipron

config = wikipron.Config(key="fr")  # French, with default options.
for word, pron in wikipron.scrape(config):
    ...

Development and Contribution

For questions, bug reports, and feature requests, please file an issue.

If you would like to contribute to the wikipron codebase, please see CONTRIBUTING.md.

License

Apache 2.0. Please see LICENSE.txt for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikipron-0.1.0.tar.gz (9.9 kB view details)

Uploaded Source

File details

Details for the file wikipron-0.1.0.tar.gz.

File metadata

  • Download URL: wikipron-0.1.0.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.1

File hashes

Hashes for wikipron-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e89e43b3fee35ebcf4c921e49a9b0c3fa26d0c07170d0982bc6c1ee14496bf25
MD5 5bfdd8256007c0cb263e41498755fd13
BLAKE2b-256 1e835fb6cca757151404e67c3a2075df50bb235689568dbc39ede32a56338b42

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page