Skip to main content

Tool to build UNIHAN dataset into datapackage / simple data format.

Project description

cihaidata-unihan - tool to build unihan into simple data format CSV format. Part of the cihai project.

Python Package Documentation Status Build Status Code Coverage License

Unihan’s data is disperved across multiple files in the format of:

U+3400      kCantonese      jau1
U+3400      kDefinition     (same as U+4E18 丘) hillock or mound
U+3400      kMandarin       qiū
U+3401      kCantonese      tim2
U+3401      kDefinition     to lick; to taste, a mat, bamboo bark
U+3401      kHanyuPinyin    10019.020:tiàn
U+3401      kMandarin       tiàn

script/process.py will download Unihan.zip and build all files into a single tabular CSV (default output: ./data/unihan.csv):

char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin
丘,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū
㐁,U+3401,tim2,"to lock; to taste, a mat, bamboo bark",10019.020:"tiàn,tiàn"

process.py supports command line arguments. See script/process.py CLI arguments for information on how you can specify custom columns, files, download URL’s and output destinations.

Being built against unit tests. See the Travis Builds and Revision History.

Usage

To download and build your own unihan.csv:

$ ./scripts/process.py

Creates data/unihan.csv.

See script/process.py CLI arguments for advanced usage examples.

Structure

# dataset metadata, schema information.
datapackage.json

# (future) when this package is stable, unihan.csv will be provided
data/unihan.csv

# stores downloaded Unihan.zip and it's txt file contents (.gitignore'd)
data/build_files/

# script to download + build a SDF csv of unihan.
scripts/process.py

# unit tests to verify behavior / consistency of builder
tests/*

# python 2/3 compatibility modules
script/_compat.py
script/unicodecsv.py

# python module, public-facing python API.
__init__.py
scripts/__init__.py

# utility / helper functions
scripts/util.py

Cihai is not required for:

  • data/unihan.csv - simple data format compatible csv file.

  • scripts/process.py - create a data/unihan.csv.

When this module is stable, data/unihan.csv will have prepared releases, without requires using scripts/process.py. process.py will not require external libraries.

Examples

Related links:

Python support

Python 2.7, >= 3.3, pypy/pypy3

Source

https://github.com/cihai/cihaidata-unihan

Docs

https://cihaidata-unihan.git-pull.com

Changelog

https://cihaidata-unihan.git-pull.com/en/latest/history.html

API

https://cihaidata-unihan.git-pull.com/en/latest/api.html

Issues

https://github.com/cihai/cihaidata-unihan/issues

Travis

https://travis-ci.org/cihai/cihaidata-unihan

Test coverage

https://codecov.io/gh/cihai/cihaidata-unihan

pypi

https://pypi-hypernode.com/pypi/cihaidata-unihan

OpenHub

https://www.openhub.net/p/cihaidata-unihan

License

MIT.

git repo

$ git clone https://github.com/cihai/cihaidata-unihan.git

install dev

$ git clone https://github.com/cihai/cihaidata-unihan.git cihai
$ cd ./cihai
$ virtualenv .env
$ source .env/bin/activate
$ pip install -e .

tests

$ python setup.py test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cihaidata-unihan-0.4.1.tar.gz (11.7 kB view details)

Uploaded Source

File details

Details for the file cihaidata-unihan-0.4.1.tar.gz.

File metadata

File hashes

Hashes for cihaidata-unihan-0.4.1.tar.gz
Algorithm Hash digest
SHA256 733372f53b69f87a1f30b2d5eb6b2d0c6bab3693bad69ef70a631b55b506708e
MD5 1d7817551cd9771b5e3b4fc73d08441c
BLAKE2b-256 00cd79e8b9c930f108ec548bd25c96a5c7e448fdef5a475c410dc69850dd20f2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page