Tool to build UNIHAN dataset into datapackage / simple data format.
Project description
cihaidata-unihan - tool to build unihan into simple data format CSV format. Part of the cihai project.
Unihan’s data is disperved across multiple files in the format of:
U+3400 kCantonese jau1 U+3400 kDefinition (same as U+4E18 丘) hillock or mound U+3400 kMandarin qiū U+3401 kCantonese tim2 U+3401 kDefinition to lick; to taste, a mat, bamboo bark U+3401 kHanyuPinyin 10019.020:tiàn U+3401 kMandarin tiàn
script/process.py will download Unihan.zip and build all files into a single tabular CSV (default output: ./data/unihan.csv):
char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin 丘,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū 㐁,U+3401,tim2,"to lock; to taste, a mat, bamboo bark",10019.020:"tiàn,tiàn"
process.py supports command line arguments. See script/process.py CLI arguments for information on how you can specify custom columns, files, download URL’s and output destinations.
Being built against unit tests. See the Travis Builds and Revision History.
Usage
To download and build your own unihan.csv:
$ ./scripts/process.py
Creates data/unihan.csv.
See script/process.py CLI arguments for advanced usage examples.
Structure
# dataset metadata, schema information.
datapackage.json
# (future) when this package is stable, unihan.csv will be provided
data/unihan.csv
# stores downloaded Unihan.zip and it's txt file contents (.gitignore'd)
data/build_files/
# script to download + build a SDF csv of unihan.
scripts/process.py
# unit tests to verify behavior / consistency of builder
tests/*
# python 2/3 compatibility modules
script/_compat.py
script/unicodecsv.py
# python module, public-facing python API.
__init__.py
scripts/__init__.py
# utility / helper functions
scripts/util.py
Cihai is not required for:
data/unihan.csv - simple data format compatible csv file.
scripts/process.py - create a data/unihan.csv.
When this module is stable, data/unihan.csv will have prepared releases, without requires using scripts/process.py. process.py will not require external libraries.
Examples
Related links:
CSV Simple Data Format (SDF): http://data.okfn.org/standards/simple-data-format
Tools: http://data.okfn.org/tools
Python support |
Python 2.7, >= 3.3, pypy/pypy3 |
Source |
|
Docs |
|
Changelog |
https://cihaidata-unihan.git-pull.com/en/latest/history.html |
API |
|
Issues |
|
Travis |
|
Test coverage |
|
pypi |
|
OpenHub |
|
License |
MIT. |
git repo |
|
install dev |
|
tests |
|
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file cihaidata-unihan-0.4.1.tar.gz
.
File metadata
- Download URL: cihaidata-unihan-0.4.1.tar.gz
- Upload date:
- Size: 11.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 733372f53b69f87a1f30b2d5eb6b2d0c6bab3693bad69ef70a631b55b506708e |
|
MD5 | 1d7817551cd9771b5e3b4fc73d08441c |
|
BLAKE2b-256 | 00cd79e8b9c930f108ec548bd25c96a5c7e448fdef5a475c410dc69850dd20f2 |