Export UNIHAN to Python, Data Package, CSV, JSON and YAML
Project description
unihan-tabular - tool to build UNIHAN into tabular-friendly formats like python, JSON, CSV and YAML. Part of the cihai project.
UNIHAN’s data is dispersed across multiple files in the format of:
U+3400 kCantonese jau1 U+3400 kDefinition (same as U+4E18 丘) hillock or mound U+3400 kMandarin qiū U+3401 kCantonese tim2 U+3401 kDefinition to lick; to taste, a mat, bamboo bark U+3401 kHanyuPinyin 10019.020:tiàn U+3401 kMandarin tiàn
$ unihan-tabular will download Unihan.zip and build all files into a single tabular friendly format.
CSV (default), $ unihan-tabular:
char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin 㐀,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū 㐁,U+3401,tim2,"to lick; to taste, a mat, bamboo bark",10019.020:tiàn,tiàn
JSON, $ unihan-tabular -F json:
[
{
"char": "㐀",
"ucn": "U+3400",
"kCantonese": "jau1",
"kDefinition": "(same as U+4E18 丘) hillock or mound",
"kHanyuPinyin": null,
"kMandarin": "qiū"
},
{
"char": "㐁",
"ucn": "U+3401",
"kCantonese": "tim2",
"kDefinition": "to lick; to taste, a mat, bamboo bark",
"kHanyuPinyin": "10019.020:tiàn",
"kMandarin": "tiàn"
}
]
YAML $ unihan-tabular -F yaml:
- char: 㐀
kCantonese: jau1
kDefinition: (same as U+4E18 丘) hillock or mound
kHanyuPinyin: null
kMandarin: qiū
ucn: U+3400
- char: 㐁
kCantonese: tim2
kDefinition: to lick; to taste, a mat, bamboo bark
kHanyuPinyin: 10019.020:tiàn
kMandarin: tiàn
ucn: U+3401
Features
automatically downloads UNIHAN from the internet
export to JSON, CSV and YAML (requires pyyaml) via -F
configurable to export specific fields via -f
accounts for encoding conflicts due to the Unicode-heavy content
designed as a technical proof for future CJK (Chinese, Japanese, Korean) datasets
core component and dependency of cihai, a CJK library
data package support
supports python 2.7, >= 3.5 and pypy
If you encounter a problem or have a question, please create an issue.
Usage
unihan-tabular supports command line arguments. See unihan-tabular CLI arguments for information on how you can specify custom columns, files, download URL’s and output destinations.
To download and build your own UNIHAN export:
$ pip install unihan-tabular
To output CSV, the default format:
$ unihan-tabular
To output JSON:
$ unihan-tabular -F json
To output YAML:
$ pip install pyyaml $ unihan-tabular -F yaml
To only output the kDefinition field in a csv:
$ unihan-tabular -f kDefinition
To output multiple fields, separate with spaces:
$ unihan-tabular -f kCantonese kDefinition
To output to a custom file:
$ unihan-tabular --destination ./exported.csv
To output to a custom file (templated file extension):
$ unihan-tabular --destination ./exported.{ext}
See unihan-tabular CLI arguments for advanced usage examples.
Structure
# output w/ JSON
{XDG data dir}/unihan_tabular/unihan.json
# output w/ CSV
{XDG data dir}/unihan_tabular/unihan.csv
# output w/ yaml (requires pyyaml)
{XDG data dir}/unihan_tabular/unihan.yaml
# script to download + build a SDF csv of unihan.
unihan_tabular/process.py
# unit tests to verify behavior / consistency of builder
tests/*
# python 2/3 compatibility module
unihan_tabular/_compat.py
# utility / helper functions
unihan_tabular/util.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file unihan-tabular-0.7.4.tar.gz
.
File metadata
- Download URL: unihan-tabular-0.7.4.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4d234b923bacd90a38232acafe86831420f302faa723d4335bfd7e20898af6e |
|
MD5 | 64bf31f35853478ea34aa7bb9d42fd41 |
|
BLAKE2b-256 | 69a150a49d00cb9dd33eba1d29d340e49e1f1a9cd288f618e1b98c7c8b4094b1 |