Skip to main content

Scraping grapheme-to-phoneme data from Wiktionary.

Project description

WikiPron

PyPI version Supported Python versions CircleCI

WikiPron is a command line toolkit for scraping grapheme-to-phoneme (G2P) data from Wiktionary.

Installation

WikiPron requires Python 3.6+. It is available through pip:

pip install wikipron

Usage

After installation, the terminal command wikipron will be available. As a basic example, the following command scrapes G2P data for French (with the ISO language code fr):

wikipron fr

By default, the results appear on the terminal, where each line has the orthography of a word, followed by a tab and then the word's pronunciation in IPA.

For example commands using advanced options, the languages/wikipron/scrape script shows how a multilingual G2P dataset can be created.

For a full list of command-line options, please run wikipron -h.

The underlying module can also be used from Python. A standard workflow looks like:

import wikipron

config = wikipron.Config(key="fr")  # French, with default options.
for word, pron in wikipron.scrape(config):
    ...

Development and Contribution

For questions, bug reports, and feature requests, please file an issue.

If you would like to contribute to the wikipron codebase, please see CONTRIBUTING.md.

We keep track of notable changes in CHANGELOG.md.

License

Apache 2.0. Please see LICENSE.txt for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikipron-0.1.1.tar.gz (10.1 kB view details)

Uploaded Source

File details

Details for the file wikipron-0.1.1.tar.gz.

File metadata

  • Download URL: wikipron-0.1.1.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.6.9

File hashes

Hashes for wikipron-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f4d97df61c1a393d2bab31b7e8e4fd31174d9d7da67e5b93d6d6c6b02a595e51
MD5 0ace7b31f3e06dcb042cf8a4054b259e
BLAKE2b-256 ff605f9cac3dfc171c55d5ad9b47f99facf00f279250435ac7eef09f1b7e351e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page