Skip to main content

Transparent optimized reading of n-dimensional Blosc2 slices for h5py

Project description

b2h5py provides h5py with transparent, automatic optimized reading of n-dimensional slices of Blosc2-compressed datasets. This optimized slicing leverages direct chunk access (skipping the slow HDF5 filter pipeline) and 2-level partitioning into chunks and then smaller blocks (so that less data is actually decompressed).

Benchmarks of this technique show 2x-5x speed-ups compared with normal filter-based access. Comparable results are obtained with a similar technique in PyTables, see Optimized Hyper-slicing in PyTables with Blosc2 NDim.

doc/benchmark.png

Usage

This optimized access works for slices with step 1 on Blosc2-compressed datasets using the native byte order. It is enabled by monkey-patching the h5py.Dataset class to extend the slicing operation. This is done on module import, so the only thing you need to do is:

import b2h5py

After that, optimization will be attempted for any slicing of a dataset (of the form dataset[...] or dataset.__getitem__(...)). If the optimization is not possible in a particular case, normal h5py slicing code will be used (which performs HDF5 filter-based access, backed by hdf5plugin to support Blosc2).

Even if the module is imported and the Dataset class is patched, you may still force-disable the optimization by setting BLOSC2_FILTER=1 in the environment.

Building

Just install PyPA build (e.g. pip install build), enter the source code directory and run pyproject-build to get a source tarball and a wheel under the dist directory.

Installing

To install as a wheel from PyPI, run pip install b2h5py.

You may also install the wheel that you built in the previous section, or enter the source code directory and run pip install . from there.

Running tests

If you have installed b2h5py, just run python -m unittest discover b2h5py.tests.

Otherwise, just enter its source code directory and run python -m unittest.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

b2h5py-0.1.1.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

b2h5py-0.1.1-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file b2h5py-0.1.1.tar.gz.

File metadata

  • Download URL: b2h5py-0.1.1.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.9.6 requests/2.28.1 setuptools/63.2.0 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.10.7

File hashes

Hashes for b2h5py-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6d6dd01594f3b17df0decbd1f241648fd20c15028b7e5cc5bd43252d9612c516
MD5 46e006245c2b47c2338da9973567f7b2
BLAKE2b-256 3ab01000a6a5a50df4fe7f2f88c35230235bde5f621673d72c643cb12071da2b

See more details on using hashes here.

File details

Details for the file b2h5py-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: b2h5py-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.9.6 requests/2.28.1 setuptools/63.2.0 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.10.7

File hashes

Hashes for b2h5py-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cb636a50f37d2c05f743f25f047978113e109775a360e022618acf098c7579da
MD5 e5e9b29682aa1c23cdef9fe8413a9237
BLAKE2b-256 2d04990b441f7d1021fd565c4e6c481d09a4de3ff39e07c7a3dd452b52afc0ef

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page