Skip to main content

PyCantonese: Cantonese Linguistics and NLP in Python

Project description

Full Documentation: https://pycantonese.org


PyPI version Supported Python versions CircleCI Builds

PyCantonese is a Python library for Cantonese linguistics and natural language processing (NLP). Currently implemented features (more to come!):

  • Accessing and searching corpus data

  • Parsing and conversion tools for Jyutping romanization

  • Stop words

  • Word segmentation

  • Part-of-speech tagging

Download and Install

To download and install the stable, most recent version:

$ pip install --upgrade pycantonese

Ready for more? Check out the Quickstart page.

Consulting

If your team would like professional assistance in using PyCantonese, technical consulting and training services are available. Please email Jackson L. Lee.

Support

If you have found PyCantonese useful and would like to offer support, buying me a coffee would go a long way!

How to Cite

PyCantonese is authored and maintained by Jackson L. Lee.

A talk introducing PyCantonese:

Lee, Jackson L. 2015. PyCantonese: Cantonese linguistic research in the age of big data. Talk at the Childhood Bilingualism Research Centre, Chinese University of Hong Kong. September 15. 2015. Notes+slides

License

MIT License. Please see LICENSE.txt in the GitHub source code for details.

The HKCanCor dataset included in PyCantonese is substantially modified from its source in terms of format. The original dataset has a CC BY license. Please see pycantonese/data/hkcancor/README.md in the GitHub source code for details.

The rime-cantonese data (release 2020.09.09) is incorporated into PyCantonese for word segmentation and characters-to-Jyutping conversion. This data has a CC BY 4.0 license. Please see pycantonese/data/rime_cantonese/README.md in the GitHub source code for details.

Acknowledgments

Wonderful resources with a permissive license that have been incorporated into PyCantonese:

  • HKCanCor

  • rime-cantonese

Individuals who have contributed feedback, bug reports, etc. (in alphabetical order of last names):

  • @cathug

  • Litong Chen

  • Jenny Chim

  • @g-traveller

  • Rachel Han

  • Ryan Lai

  • Charles Lam

  • Chaak Ming Lau

  • Hill Ma

  • @richielo

  • @rylanchiu

  • Stephan Stiller

  • Tsz-Him Tsui

  • Robin Yuen

Changelog

Please see CHANGELOG.md.

Setting up a Development Environment

The latest code under development is available on Github at jacksonllee/pycantonese. You need to have Git LFS installed on your system. To obtain this version for experimental features or for development:

$ git clone https://github.com/jacksonllee/pycantonese.git
$ cd pycantonese
$ git lfs pull
$ pip install -r dev-requirements.txt
$ pip install -e .

To run tests and styling checks:

$ pytest -vv --doctest-modules --cov=pycantonese pycantonese docs/source
$ flake8 pycantonese
$ black --check pycantonese

To build the documentation website files:

$ python docs/source/build_docs.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycantonese-3.2.4.tar.gz (3.8 MB view details)

Uploaded Source

Built Distribution

pycantonese-3.2.4-py3-none-any.whl (3.9 MB view details)

Uploaded Python 3

File details

Details for the file pycantonese-3.2.4.tar.gz.

File metadata

  • Download URL: pycantonese-3.2.4.tar.gz
  • Upload date:
  • Size: 3.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.7.0 requests/2.24.0 setuptools/54.1.2 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for pycantonese-3.2.4.tar.gz
Algorithm Hash digest
SHA256 130459bc653ce8a9faa557622b075302fde1f46121dcc246941156358fe3ed21
MD5 bb9e2aade7420a41f857b00f54784b9e
BLAKE2b-256 fa636c6d37764500b254f45741a83b46dfd7b5d2aea7666ace4a40a54d0a6218

See more details on using hashes here.

File details

Details for the file pycantonese-3.2.4-py3-none-any.whl.

File metadata

  • Download URL: pycantonese-3.2.4-py3-none-any.whl
  • Upload date:
  • Size: 3.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.7.0 requests/2.24.0 setuptools/54.1.2 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for pycantonese-3.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ac3d930fd85e7cc0ac7ad2c8594c709d201fca7f700e7cc07b2e105da4035b60
MD5 7d13243a98d7fefc6b7e46a2e8fca783
BLAKE2b-256 2828d036b87eb72801e85b8dd237b59a17fdb55043d3551e182c58d542d8b8c8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page