Skip to main content

基于 g2pW 提升 pypinyin 的准确性。

Project description

pypinyin-g2pW

基于 g2pW 提升 pypinyin 的准确性。

特点:

  • 可以通过训练模型的方式提升拼音准确性。
  • 功能和使用习惯与 pypinyin 基本保持一致,支持多种拼音风格。

使用

安装依赖

  1. 安装 PyTorch

  2. 下载并解压 G2PWModel:

    wget https://storage.googleapis.com/esun-ai/g2pW/G2PWModel-v2-onnx.zip
    unzip G2PWModel-v2-onnx.zip
    
  3. 安装 git-lfs

  4. 下载 bert-base-chinese:

    git lfs install
    git clone https://huggingface.co/bert-base-chinese
    
  5. 安装本项目:

    pip install pypinyin-g2pw
    

使用示例

>>> from pypinyin import Style
>>> from pypinyin_g2pw import G2PWPinyin

# 需要将 model_dir 和 model_source 的值指向下载的模型数据目录
>>> g2pw = G2PWPinyin(model_dir='G2PWModel/',
                      model_source='bert-base-chinese/',
                      v_to_u=False, neutral_tone_with_five=True)
>>> han = '然而,他红了20年以后,他竟退出了大家的视线。'

# def lazy_pinyin(self, hans, style=Style.NORMAL, errors='default', strict=True, **kwargs)
# 通过 lazy_pinyin 方法获取拼音数据,各个参数的含义和作用跟 pypinyin 中是一样的,
# v_to_u 和 neutral_tone_with_five 参数只能在初始化 G2PWPinyin 时指定。

>>> g2pw.lazy_pinyin(han)
['ran', 'er', ',', 'ta', 'hong', 'le', '20', 'nian', 'yi', 'hou', ',', 'ta', 'jing', 'tui', 'chu', 'le', 'da', 'jia', 'de', 'shi', 'xian', '。']

>>> g2pw.lazy_pinyin(han, style=Style.TONE)
['rán', 'ér', ',', 'tā', 'hóng', 'le', '20', 'nián', 'yǐ', 'hòu', ',', 'tā', 'jìng', 'tuì', 'chū', 'le', 'dà', 'jiā', 'de', 'shì', 'xiàn', '。']

>>> g2pw.lazy_pinyin(han, style=Style.TONE3)
['ran2', 'er2', ',', 'ta1', 'hong2', 'le5', '20', 'nian2', 'yi3', 'hou4', ',', 'ta1', 'jing4', 'tui4', 'chu1', 'le5', 'da4', 'jia1', 'de5', 'shi4', 'xian4', '。']

离线使用

当前版本暂不支持完全的离线使用,详见 GitYCC/g2pW/pull/15

模型训练

详见 g2pW 官方文档中的说明。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypinyin-g2pw-0.2.0.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

pypinyin_g2pw-0.2.0-py2.py3-none-any.whl (4.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pypinyin-g2pw-0.2.0.tar.gz.

File metadata

  • Download URL: pypinyin-g2pw-0.2.0.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.2

File hashes

Hashes for pypinyin-g2pw-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f7b85f1bbabec92e204cd2703fc673c676d79ae571ef053ea802c8a4756b941a
MD5 1e6c6047749e552931d25346b7dab1bb
BLAKE2b-256 5c4e2522130fec34278e2f280b4b881546e8a8cce196aeee747bf4f3e0faa14b

See more details on using hashes here.

File details

Details for the file pypinyin_g2pw-0.2.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for pypinyin_g2pw-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 04379ddf32cfda23459101e8e95e9b4926d114ecd23e8cd2d77a498fe1a5c667
MD5 f4a309d9aed4ccee15baddcf800d35d2
BLAKE2b-256 17310bb4dd5c9040703e82a9767aa80933cdfd8c7987248e1ffb36acee366feb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page