Skip to main content

Data transformation framework for LinkML data models

Project description

Koza - a data transformation framework

Pyversions PyPi Github Action

pupa

Documentation

Disclaimer: Koza is in beta; we are looking for beta testers

Overview

  • Transform csv, json, yaml, jsonl, and xml and converting them to a target csv, json, or jsonl format based on your dataclass model.
  • Koza also can output data in the KGX format
  • Write data transforms in semi-declarative Python
  • Configure source files, expected columns/json properties and path filters, field filters, and metadata in yaml
  • Create or import mapping files to be used in ingests (eg id mapping, type mappings)
  • Create and use translation tables to map between source and target vocabularies

Installation

Koza is available on PyPi and can be installed via pip/pipx:

[pip|pipx] install koza

Usage

NOTE: As of version 0.2.0, there is a new method for getting your ingest's KozaApp instance. Please see the updated documentation for details.

See the Koza documentation for usage information

Try the Examples

Validate

Give Koza a local or remote csv file, and get some basic information (headers, number of rows)

koza validate \
  --file https://raw.githubusercontent.com/monarch-initiative/koza/main/examples/data/string.tsv \
  --delimiter ' '

Sending a json or jsonl formatted file will confirm if the file is valid json or jsonl

koza validate \
  --file ./examples/data/ZFIN_PHENOTYPE_0.jsonl.gz \
  --format jsonl
koza validate \
  --file ./examples/data/ddpheno.json.gz \
  --format json \
  --compression gzip

Transform

Run the example ingest, "string/protein-links-detailed"

koza transform --source examples/string/protein-links-detailed.yaml --global-table examples/translation_table.yaml

koza transform --source examples/string-declarative/protein-links-detailed.yaml --global-table examples/translation_table.yaml

note: koza expects a directory structure as described in the above example (examples/ingest_name/ingest.yaml)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

koza-0.3.0.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

koza-0.3.0-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file koza-0.3.0.tar.gz.

File metadata

  • Download URL: koza-0.3.0.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.10 Linux/5.15.0-1034-azure

File hashes

Hashes for koza-0.3.0.tar.gz
Algorithm Hash digest
SHA256 786150a3963654fef82bf36cb5f03acac22b74b5da4f8da707262ab924d70eed
MD5 c2930bc159442005849f0ff8315b4337
BLAKE2b-256 90aa6072f120c1cc81bf66e64972a6bb4b5509030ab9ca819ceff67a42937447

See more details on using hashes here.

File details

Details for the file koza-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: koza-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 31.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.10 Linux/5.15.0-1034-azure

File hashes

Hashes for koza-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 144ff777a9c3e14154e12d1f1875c557530b22bf57abe9d2a3a1dc3f06624266
MD5 e3aea1eb5c0a6721b43f0c0fd6daf461
BLAKE2b-256 81f94e3a7f344e56cfa6d7e8ab0921013ff09ca0a1d3ebb66d9f7587066f7626

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page