Zhon provides constants used in Chinese text processing.
Project description
Zhon is a Python library that provides constants commonly used in Chinese text processing.
Documentation: http://zhon.rtfd.org
GitHub: https://github.com/tsroten/zhon
Free software: MIT license
About
Zhon’s constants can be used in Chinese text processing, for example:
Find CJK characters in a string:
>>> re.findall('[%s]' % zhon.hanzi.characters, 'I broke a plate: 我打破了一个盘子.') ['我', '打', '破', '了', '一', '个', '盘', '子']
Validate Pinyin syllables, words, or sentences:
>>> re.findall(zhon.pinyin.syllable, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I) ['Yuàn', 'zi', 'lǐ', 'tíng', 'zhe', 'yí', 'liàng', 'chē'] >>> re.findall(zhon.pinyin.word, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I) ['Yuànzi', 'lǐ', 'tíngzhe', 'yí', 'liàng', 'chē'] >>> re.findall(zhon.pinyin.sentence, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I) ['Yuànzi lǐ tíngzhe yí liàng chē.']
Features
- Includes commonly-used constants:
CJK characters and radicals
Chinese punctuation marks
Chinese sentence regular expression pattern
Pinyin vowels, consonants, lowercase, uppercase, and punctuation
Pinyin syllable, word, and sentence regular expression patterns
Zhuyin characters and marks
Zhuyin syllable regular expression pattern
CC-CEDICT characters
Runs on Python 2.7 and 3
Getting Started
Read Zhon’s introduction
Learn from the API documentation
Contribute documentation, code, or feedback
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file zhon-1.1.5.tar.gz
.
File metadata
- Download URL: zhon-1.1.5.tar.gz
- Upload date:
- Size: 99.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 793723575c46f10ace8846c579ce740b04c73e2aa583e04e000aedbd4a47f87f |
|
MD5 | 6955ba5a3b28ee2945247c5ed5ba0509 |
|
BLAKE2b-256 | 9fb0c56c6079ad47c35a2341440818b6620de8c46a265ed690a51b1a4e5591bc |