Skip to main content

Functions to make reference descriptions for ReferenceFileSystem

Project description

kerchunk

Cloud-friendly access to archival data

Docs Tests Pypi Conda-forge

Kerchunk is a library that provides a unified way to represent a variety of chunked, compressed data formats (e.g. NetCDF, HDF5, GRIB), allowing efficient access to the data from traditional file systems or cloud object storage. It also provides a flexible way to create virtual datasets from multiple files. It does this by extracting the byte ranges, compression information and other information about the data and storing this metadata in a new, separate object. This means that you can create a virtual aggregate dataset over potentially many source files, for efficient, parallel and cloud-friendly in-situ access without having to copy or translate the originals. It is a gateway to in-the-cloud massive data processing while the data providers still insist on using legacy formats for archival storage.

Why Kerchunk:

We provide the following things:

  • completely serverless architecture
  • metadata consolidation, so you can understand a many-file dataset (metadata plus physical storage) in a single read
  • read from all of the storage backends supported by fsspec, including object storage (s3, gcs, abfs, alibaba), http, cloud user storage (dropbox, gdrive) and network protocols (ftp, ssh, hdfs, smb...)
  • loading of various file types (currently netcdf4/HDF, grib2, tiff, fits, zarr), potentially heterogeneous within a single dataset, without a need to go via the specific driver (e.g., no need for h5py)
  • asynchronous concurrent fetch of many data chunks in one go, amortizing the cost of latency
  • parallel access with a library like zarr without any locks
  • logical datasets viewing many (>~millions) data files, and direct access/subselection to them via coordinate indexing across an arbitrary number of dimensions
logo

For further information, please see the documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kerchunk-0.0.8.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

kerchunk-0.0.8-py3-none-any.whl (44.3 kB view details)

Uploaded Python 3

File details

Details for the file kerchunk-0.0.8.tar.gz.

File metadata

  • Download URL: kerchunk-0.0.8.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.12

File hashes

Hashes for kerchunk-0.0.8.tar.gz
Algorithm Hash digest
SHA256 5846037e659a138763067c2266509b6584fda64c83bed786b247c82ad2e8640e
MD5 e094caa0d279dbf33359d3ef08373846
BLAKE2b-256 e2e50bc12f7035ee2c7aaeef0fe81a1f1843e16f6df9827c37e03c0096a97770

See more details on using hashes here.

File details

Details for the file kerchunk-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: kerchunk-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 44.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.12

File hashes

Hashes for kerchunk-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 76de5a979e61ea0c771af2193629a4bd44c3f73af6b0d3a1a0cfa5e16207a5c1
MD5 db3bf0bdac2128d5e4a6452db67baa57
BLAKE2b-256 b27e2e597439f9a51fc56ce215dc8a9e5db8b32d03f5c0fd1b3833c7a961ac30

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page