Skip to main content

Download and export UNIHAN to Python, CSV, JSON and YAML

Project description

unihan-tabular - tool to build UNIHAN into tabular-friendly formats like python, JSON, CSV and YAML. Part of the cihai project.

Python Package Documentation Status Build Status Code Coverage License

Unihan’s data is dispersed across multiple files in the format of:

U+3400      kCantonese      jau1
U+3400      kDefinition     (same as U+4E18 丘) hillock or mound
U+3400      kMandarin       qiū
U+3401      kCantonese      tim2
U+3401      kDefinition     to lick; to taste, a mat, bamboo bark
U+3401      kHanyuPinyin    10019.020:tiàn
U+3401      kMandarin       tiàn

unihan_tabular/process.py will download Unihan.zip and build all files into a single tabular friendly format.

CSV (default output: ./data/unihan.csv):

char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin
㐀,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū
㐁,U+3401,tim2,"to lick; to taste, a mat, bamboo bark",10019.020:tiàn,tiàn

JSON (default output: ./data/unihan.json):

[
  {
    "char": "㐀",
    "ucn": "U+3400",
    "kCantonese": "jau1",
    "kDefinition": "(same as U+4E18 丘) hillock or mound",
    "kHanyuPinyin": null,
    "kMandarin": "qiū"
  },
  {
    "char": "㐁",
    "ucn": "U+3401",
    "kCantonese": "tim2",
    "kDefinition": "to lick; to taste, a mat, bamboo bark",
    "kHanyuPinyin": "10019.020:tiàn",
    "kMandarin": "tiàn"
  }
]

YAML (default output: ./data/unihan.yaml):

- char: 
  kCantonese: jau1
  kDefinition: (same as U+4E18 丘) hillock or mound
  kHanyuPinyin: null
  kMandarin: qiū
  ucn: U+3400
- char: 
  kCantonese: tim2
  kDefinition: to lick; to taste, a mat, bamboo bark
  kHanyuPinyin: 10019.020:tiàn
  kMandarin: tiàn
  ucn: U+3401

process.py supports command line arguments. See unihan_tabular/process.py CLI arguments for information on how you can specify custom columns, files, download URL’s and output destinations.

Usage

To download and build your own unihan.csv:

$ pip install unihan-tabular
$ unihan-tabular

Creates data/unihan.json.

To output CSV:

$ unihan-tabular -F csv

To output YAML:

$ pip install pyyaml
$ unihan-tabular -F yaml

To only output the kDefinition field in a csv:

$ unihan-tabular -F csv -f kDefinition

See unihan_tabular/process.py CLI arguments for advanced usage examples.

Structure

# output (JSON)
data/unihan.json

# output (CSV)
data/unihan.csv

# script to download + build a SDF csv of unihan.
unihan_tabular/process.py

# unit tests to verify behavior / consistency of builder
tests/*

# python 2/3 compatibility modules
unihan_tabular/_compat.py
unihan_tabular/unicodecsv.py

# utility / helper functions
unihan_tabular/util.py
  • data/unihan.csv - CSV export file.

  • unihan_tabular/process.py - create a data/unihan.csv.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unihan-tabular-0.6.1b0.tar.gz (6.2 kB view details)

Uploaded Source

File details

Details for the file unihan-tabular-0.6.1b0.tar.gz.

File metadata

File hashes

Hashes for unihan-tabular-0.6.1b0.tar.gz
Algorithm Hash digest
SHA256 9ce56a1480b9ca887c6b5e21a83ec4e0bb6dc7f1ba84c6917c10d011eff24ee1
MD5 092e74bfefd25bd938eb4515d3952a53
BLAKE2b-256 605ee89ab67df91d60297e0b48c7fcb4234d0a7ee44b41a02da95cc410e09422

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page