基于 g2pW 提升 pypinyin 的准确性。
Project description
pypinyin-g2pW
基于 g2pW 0.0.6 提升 pypinyin 的准确性。
优点:可以通过训练模型的方式提升拼音准确性。
缺点:依赖比较多,执行速度比较慢。
使用
安装依赖
-
安装 PyTorch。
-
下载并解压 G2PWModel:
mkdir G2PWModel cd G2PWModel wget https://storage.googleapis.com/esun-ai/g2pW/G2PWModel-v2.zip unzip G2PWModel-v2.zip cd ../
-
安装 git-lfs。
-
git lfs install git clone https://huggingface.co/bert-base-chinese
-
安装本项目:
pip install pypinyin-g2pw
使用示例
>>> from pypinyin import Style
>>> from pypinyin_g2pw import G2PWPinyin
# 需要将 model_dir 和 model_source 的值指向下载的模型数据目录
>>> g2pw = G2PWPinyin(model_dir='G2PWModel/G2PWModel-v2/',
model_source='bert-base-chinese/',
v_to_u=False, neutral_tone_with_five=True)
>>> han = '然而,他红了20年以后,他竟退出了大家的视线。'
>>> g2pw.lazy_pinyin(han, style=Style.TONE)
['rán', 'ér', ',', 'tā', 'hóng', 'le', '20', 'nián', 'yǐ', 'hòu', ',', 'tā', 'jìng', 'tuì', 'chū', 'le', 'dà', 'jiā', 'de', 'shì', 'xiàn', '。']
>>> g2pw.lazy_pinyin(han, style=Style.TONE3)
['ran2', 'er2', ',', 'ta1', 'hong2', 'le5', '20', 'nian2', 'yi3', 'hou4', ',', 'ta1', 'jing4', 'tui4', 'chu1', 'le5', 'da4', 'jia1', 'de5', 'shi4', 'xian4', '。']
模型训练
详见 g2pW 官方文档中的说明。
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pypinyin-g2pw-0.1.0.tar.gz
(3.8 kB
view details)
Built Distribution
File details
Details for the file pypinyin-g2pw-0.1.0.tar.gz
.
File metadata
- Download URL: pypinyin-g2pw-0.1.0.tar.gz
- Upload date:
- Size: 3.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17d0868e498a542fb308b86db0572d9f5ec6b45ee26c0b431a6c921bbae39da3 |
|
MD5 | 2f8421e1ed9f9005050a0a38c1f17cde |
|
BLAKE2b-256 | 3bd5bc4e5b3c937e49ab7cafdff24edeb406d7e896cc8bec8f91d88239890d71 |
File details
Details for the file pypinyin_g2pw-0.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: pypinyin_g2pw-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b65f097018a42d68b8cf3f79a3c5bfe73e7077983f3997f53af90458f6543966 |
|
MD5 | 55b0dc23c5017b09812981a5569ccfff |
|
BLAKE2b-256 | e012b9b82b4b84119c8622b9e07ed21ccc6e38d8a173c0acf65098bfa00e7147 |