Download and export UNIHAN to Python, CSV, JSON and YAML
Project description
unihan-tabular - tool to build UNIHAN into tabular-friendly formats like python, JSON, CSV and YAML. Part of the cihai project.
Unihan’s data is dispersed across multiple files in the format of:
U+3400 kCantonese jau1 U+3400 kDefinition (same as U+4E18 丘) hillock or mound U+3400 kMandarin qiū U+3401 kCantonese tim2 U+3401 kDefinition to lick; to taste, a mat, bamboo bark U+3401 kHanyuPinyin 10019.020:tiàn U+3401 kMandarin tiàn
unihan_tabular/process.py will download Unihan.zip and build all files into a single tabular friendly format.
CSV (default output: ./data/unihan.csv):
char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin 㐀,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū 㐁,U+3401,tim2,"to lick; to taste, a mat, bamboo bark",10019.020:tiàn,tiàn
JSON (default output: ./data/unihan.json):
[
{
"char": "㐀",
"ucn": "U+3400",
"kCantonese": "jau1",
"kDefinition": "(same as U+4E18 丘) hillock or mound",
"kHanyuPinyin": null,
"kMandarin": "qiū"
},
{
"char": "㐁",
"ucn": "U+3401",
"kCantonese": "tim2",
"kDefinition": "to lick; to taste, a mat, bamboo bark",
"kHanyuPinyin": "10019.020:tiàn",
"kMandarin": "tiàn"
}
]
YAML (default output: ./data/unihan.yaml):
- char: 㐀
kCantonese: jau1
kDefinition: (same as U+4E18 丘) hillock or mound
kHanyuPinyin: null
kMandarin: qiū
ucn: U+3400
- char: 㐁
kCantonese: tim2
kDefinition: to lick; to taste, a mat, bamboo bark
kHanyuPinyin: 10019.020:tiàn
kMandarin: tiàn
ucn: U+3401
process.py supports command line arguments. See unihan_tabular/process.py CLI arguments for information on how you can specify custom columns, files, download URL’s and output destinations.
Usage
To download and build your own unihan.csv:
$ pip install unihan-tabular
$ unihan-tabular
Creates data/unihan.json.
To output CSV:
$ unihan-tabular -F csv
To output YAML:
$ pip install pyyaml $ unihan-tabular -F yaml
To only output the kDefinition field in a csv:
$ unihan-tabular -F csv -f kDefinition
See unihan_tabular/process.py CLI arguments for advanced usage examples.
Structure
# output (JSON)
data/unihan.json
# output (CSV)
data/unihan.csv
# script to download + build a SDF csv of unihan.
unihan_tabular/process.py
# unit tests to verify behavior / consistency of builder
tests/*
# python 2/3 compatibility modules
unihan_tabular/_compat.py
unihan_tabular/unicodecsv.py
# utility / helper functions
unihan_tabular/util.py
data/unihan.csv - CSV export file.
unihan_tabular/process.py - create a data/unihan.csv.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file unihan-tabular-0.6.1b0.tar.gz
.
File metadata
- Download URL: unihan-tabular-0.6.1b0.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ce56a1480b9ca887c6b5e21a83ec4e0bb6dc7f1ba84c6917c10d011eff24ee1 |
|
MD5 | 092e74bfefd25bd938eb4515d3952a53 |
|
BLAKE2b-256 | 605ee89ab67df91d60297e0b48c7fcb4234d0a7ee44b41a02da95cc410e09422 |