Skip to main content

基于 g2pW 提升 pypinyin 的准确性。

Project description

pypinyin-g2pW

基于 g2pW 0.0.6 提升 pypinyin 的准确性。

优点:可以通过训练模型的方式提升拼音准确性。

缺点:依赖比较多,执行速度比较慢。

使用

安装依赖

  1. 安装 PyTorch

  2. 下载并解压 G2PWModel:

    mkdir G2PWModel
    cd G2PWModel
    wget https://storage.googleapis.com/esun-ai/g2pW/G2PWModel-v2.zip
    unzip G2PWModel-v2.zip
    cd ../
    
  3. 安装 git-lfs

  4. 下载 bert-base-chinese:

    git lfs install
    git clone https://huggingface.co/bert-base-chinese
    
  5. 安装本项目:

    pip install pypinyin-g2pw
    

使用示例

>>> from pypinyin import Style
>>> from pypinyin_g2pw import G2PWPinyin

# 需要将 model_dir 和 model_source 的值指向下载的模型数据目录
>>> g2pw = G2PWPinyin(model_dir='G2PWModel/G2PWModel-v2/',
                  model_source='bert-base-chinese/',
                  v_to_u=False, neutral_tone_with_five=True)
>>> han = '然而,他红了20年以后,他竟退出了大家的视线。'
>>> g2pw.lazy_pinyin(han, style=Style.TONE)
['rán', 'ér', ',', 'tā', 'hóng', 'le', '20', 'nián', 'yǐ', 'hòu', ',', 'tā', 'jìng', 'tuì', 'chū', 'le', 'dà', 'jiā', 'de', 'shì', 'xiàn', '。']
>>> g2pw.lazy_pinyin(han, style=Style.TONE3)
['ran2', 'er2', ',', 'ta1', 'hong2', 'le5', '20', 'nian2', 'yi3', 'hou4', ',', 'ta1', 'jing4', 'tui4', 'chu1', 'le5', 'da4', 'jia1', 'de5', 'shi4', 'xian4', '。']

模型训练

详见 g2pW 官方文档中的说明。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypinyin-g2pw-0.1.0.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

pypinyin_g2pw-0.1.0-py2.py3-none-any.whl (4.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pypinyin-g2pw-0.1.0.tar.gz.

File metadata

  • Download URL: pypinyin-g2pw-0.1.0.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.2

File hashes

Hashes for pypinyin-g2pw-0.1.0.tar.gz
Algorithm Hash digest
SHA256 17d0868e498a542fb308b86db0572d9f5ec6b45ee26c0b431a6c921bbae39da3
MD5 2f8421e1ed9f9005050a0a38c1f17cde
BLAKE2b-256 3bd5bc4e5b3c937e49ab7cafdff24edeb406d7e896cc8bec8f91d88239890d71

See more details on using hashes here.

File details

Details for the file pypinyin_g2pw-0.1.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for pypinyin_g2pw-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b65f097018a42d68b8cf3f79a3c5bfe73e7077983f3997f53af90458f6543966
MD5 55b0dc23c5017b09812981a5569ccfff
BLAKE2b-256 e012b9b82b4b84119c8622b9e07ed21ccc6e38d8a173c0acf65098bfa00e7147

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page