Skip to main content

Download and export UNIHAN to Python, CSV, JSON and YAML

Project description

*unihan-tabular* - tool to build `UNIHAN`_ into tabular-friendly formats
like python, JSON, CSV and YAML. Part of the `cihai`_ project.

|pypi| |docs| |build-status| |coverage| |license|

Unihan's data is dispersed across multiple files in the format of::

U+3400 kCantonese jau1
U+3400 kDefinition (same as U+4E18 丘) hillock or mound
U+3400 kMandarin qiū
U+3401 kCantonese tim2
U+3401 kDefinition to lick; to taste, a mat, bamboo bark
U+3401 kHanyuPinyin 10019.020:tiàn
U+3401 kMandarin tiàn

``unihan_tabular/process.py`` will download Unihan.zip and build all files into a
single tabular friendly format.

CSV (default output: *./data/unihan.csv*)::

char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin
㐀,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū
㐁,U+3401,tim2,"to lick; to taste, a mat, bamboo bark",10019.020:tiàn,tiàn

JSON (default output: *./data/unihan.json*):

.. code-block:: json

[
{
"char": "㐀",
"ucn": "U+3400",
"kCantonese": "jau1",
"kDefinition": "(same as U+4E18 丘) hillock or mound",
"kHanyuPinyin": null,
"kMandarin": "qiū"
},
{
"char": "㐁",
"ucn": "U+3401",
"kCantonese": "tim2",
"kDefinition": "to lick; to taste, a mat, bamboo bark",
"kHanyuPinyin": "10019.020:tiàn",
"kMandarin": "tiàn"
}
]

YAML (default output: *./data/unihan.yaml*):

.. code-block:: yaml

- char: 㐀
kCantonese: jau1
kDefinition: (same as U+4E18 丘) hillock or mound
kHanyuPinyin: null
kMandarin: qiū
ucn: U+3400
- char: 㐁
kCantonese: tim2
kDefinition: to lick; to taste, a mat, bamboo bark
kHanyuPinyin: 10019.020:tiàn
kMandarin: tiàn
ucn: U+3401

``process.py`` supports command line arguments. See `unihan_tabular/process.py CLI
arguments`_ for information on how you can specify custom columns, files,
download URL's and output destinations.

.. _cihai: https://cihai.git-pull.com
.. _cihai-handbook: https://github.com/cihai/cihai-handbook
.. _cihai team: https://github.com/cihai?tab=members
.. _cihai-python: https://github.com/cihai/cihai-python
.. _unihan-tabular on github: https://github.com/cihai/unihan-tabular

Usage
-----

To download and build your own ``unihan.csv``:

.. code-block:: bash

$ pip install unihan-tabular

.. code-block:: bash

$ unihan-tabular

Creates ``data/unihan.json``.

To output CSV::

$ unihan-tabular -F csv

To output YAML::

$ pip install pyyaml
$ unihan-tabular -F yaml

To only output the kDefinition field in a csv::

$ unihan-tabular -F csv -f kDefinition

See `unihan_tabular/process.py CLI arguments`_ for advanced usage examples.

.. _unihan_tabular/process.py CLI arguments: http://unihan-tabular.readthedocs.org/en/latest/cli.html

Structure
---------

.. code-block:: bash

# output (JSON)
data/unihan.json

# output (CSV)
data/unihan.csv

# script to download + build a SDF csv of unihan.
unihan_tabular/process.py

# unit tests to verify behavior / consistency of builder
tests/*

# python 2/3 compatibility modules
unihan_tabular/_compat.py
unihan_tabular/unicodecsv.py

# utility / helper functions
unihan_tabular/util.py

- ``data/unihan.csv`` - CSV export file.
- ``unihan_tabular/process.py`` - create a ``data/unihan.csv``.

.. _MIT: http://opensource.org/licenses/MIT
.. _API: http://cihai.readthedocs.org/en/latest/api.html
.. _UNIHAN: http://www.unicode.org/charts/unihan.html

.. |pypi| image:: https://img.shields.io/pypi/v/unihan-tabular.svg
:alt: Python Package
:target: http://badge.fury.io/py/unihan-tabular

.. |build-status| image:: https://img.shields.io/travis/cihai/unihan-tabular.svg
:alt: Build Status
:target: https://travis-ci.org/cihai/unihan-tabular

.. |coverage| image:: https://codecov.io/gh/cihai/unihan-tabular/branch/master/graph/badge.svg
:alt: Code Coverage
:target: https://codecov.io/gh/cihai/unihan-tabular

.. |license| image:: https://img.shields.io/github/license/cihai/unihan-tabular.svg
:alt: License

.. |docs| image:: https://readthedocs.org/projects/unihan-tabular/badge/?version=latest
:alt: Documentation Status
:scale: 100%
:target: https://readthedocs.org/projects/unihan-tabular/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unihan-tabular-0.6.2.tar.gz (15.6 kB view details)

Uploaded Source

File details

Details for the file unihan-tabular-0.6.2.tar.gz.

File metadata

File hashes

Hashes for unihan-tabular-0.6.2.tar.gz
Algorithm Hash digest
SHA256 25c9cceb76d1deae9e9fa62988e9072992a2dfef1ade25938e604dedd1dee161
MD5 662b7a31cffcd7ffa96bbdeb50032d12
BLAKE2b-256 aab72f7338ba29325ee9745fe356912ae81153c87a90c31939577c22bd375929

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page