Skip to main content

No project description provided

Project description

Kartothek

Build Status Documentation Status codecov.io License: MIT Anaconda-Server Badge Anaconda-Server Badge

Kartothek is a Python library to manage (create, read, update, delete) large amounts of tabular data in a blob store. It stores data as datasets, which it presents as pandas DataFrames to the user. Datasets are a collection of files with the same schema that reside in a blob store. Kartothek uses a metadata definition to handle these datasets efficiently. For distributed access and manipulation of datasets Kartothek offers a Dask interface.

Storing data distributed over multiple files in a blob store (S3, ABS, GCS, etc.) allows for a fast, cost-efficient and highly scalable data infrastructure. A downside of storing data solely in an object store is that the storages themselves give little to no guarantees beyond the consistency of a single file. In particular, they cannot guarantee the consistency of your dataset. If we demand a consistent state of our dataset at all times, we need to track the state of the dataset. Kartothek frees us from having to do this manually.

The kartothek.io module provides building blocks to create and modify these datasets in data pipelines. Kartothek handles I/O, tracks dataset partitions and selects subsets of data transparently.

Installation

Installers for the latest released version are availabe at the Python package index and on conda.

# Install with pip
pip install kartothek
# Install with conda
conda install -c conda-forge kartothek

What is a (real) Kartothek?

A Kartothek (or more modern: Zettelkasten/Katalogkasten) is a tool to organize (high-level) information extracted from a source of information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kartothek-5.0.0rc1.tar.gz (972.7 kB view details)

Uploaded Source

Built Distribution

kartothek-5.0.0rc1-py3-none-any.whl (240.2 kB view details)

Uploaded Python 3

File details

Details for the file kartothek-5.0.0rc1.tar.gz.

File metadata

  • Download URL: kartothek-5.0.0rc1.tar.gz
  • Upload date:
  • Size: 972.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for kartothek-5.0.0rc1.tar.gz
Algorithm Hash digest
SHA256 ce2ab5180d426a0ba4d9da911df80437be8dfa0e20c41079b04d8865aee92904
MD5 87494598e5d9d60f980c41414ccd29a4
BLAKE2b-256 b523405d8b91a851dd70a845e8f16d7fd82e1759c5d6b523ef41de5bf9091067

See more details on using hashes here.

Provenance

File details

Details for the file kartothek-5.0.0rc1-py3-none-any.whl.

File metadata

  • Download URL: kartothek-5.0.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 240.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for kartothek-5.0.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 1bf4932a3611ddd6a1e44597c8db4e6f51bfaa8ae03ce74cd64174bff6e852f7
MD5 570989fbda1209804720e5fe47120758
BLAKE2b-256 3678922e3e5fd1889282a1284dfaf2a171c0a488b225c117e433d7872309218e

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page