Zhon provides constants used in Chinese text processing.
Project description
Zhon is a Python library that provides constants commonly used in Chinese text processing:
CJK characters and radicals
Chinese punctuation marks
Chinese sentence regular expression pattern
Pinyin vowels, consonants, lowercase, uppercase, and punctuation
Pinyin syllable, word, and sentence regular expression patterns
Zhuyin characters and marks
Zhuyin syllable regular expression pattern
CC-CEDICT characters
Some quick examples:
Find CJK characters in a string:
>>> re.findall('[%s]' % zhon.hanzi.characters, 'I broke a plate: 我打破了一个盘子.') ['我', '打', '破', '了', '一', '个', '盘', '子']
Validate Pinyin syllables, words, or sentences:
>>> re.findall(zhon.pinyin.syllable, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I) ['Yuàn', 'zi', 'lǐ', 'tíng', 'zhe', 'yí', 'liàng', 'chē'] >>> re.findall(zhon.pinyin.word, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I) ['Yuànzi', 'lǐ', 'tíngzhe', 'yí', 'liàng', 'chē'] >>> re.findall(zhon.pinyin.sentence, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I) ['Yuànzi lǐ tíngzhe yí liàng chē.']
Documentation
Zhon has complete documentation. Check it out if you want to find out how to use Zhon.
Name
Zhon is short for ZHongwen cONstants. It is pronounced like the name ‘John’.
Install
Zhon supports Python 2.7 and 3. Install using pip:
$ pip install zhon
Bugs and Feature Requests
Zhon uses its GitHub Issues page to track bugs, feature requests, and support questions.
License
Zhon is released under the OSI-approved MIT License. See the file LICENSE.txt for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file zhon-1.1.3.tar.gz
.
File metadata
- Download URL: zhon-1.1.3.tar.gz
- Upload date:
- Size: 98.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bfd9964352fc3753d60d0b8185b514aa8c9b9f431afae0b08181496f30633878 |
|
MD5 | 947eb06795a883751a158729d1d21149 |
|
BLAKE2b-256 | dd40753cc1b050149ee52ceefa7098bf1e947eb06b496aceaa1be959bfc1b07d |