Skip to main content

Transparent optimized reading of n-dimensional Blosc2 slices for h5py

Project description

b2h5py provides h5py with transparent, automatic optimized reading of n-dimensional slices of Blosc2-compressed datasets. This optimized slicing leverages direct chunk access (skipping the slow HDF5 filter pipeline) and 2-level partitioning into chunks and then smaller blocks (so that less data is actually decompressed).

Benchmarks of this technique show 2x-5x speed-ups compared with normal filter-based access. Comparable results are obtained with a similar technique in PyTables, see Optimized Hyper-slicing in PyTables with Blosc2 NDim.

doc/benchmark.png

Usage

This optimized access works for slices with step 1 on Blosc2-compressed datasets using the native byte order. It is enabled by monkey-patching the h5py.Dataset class to extend the slicing operation. The easiest way to do this is:

import b2h5py.auto

After that, optimization will be attempted for any slicing of a dataset (of the form dataset[...] or dataset.__getitem__(...)). If the optimization is not possible in a particular case, normal h5py slicing code will be used (which performs HDF5 filter-based access, backed by hdf5plugin to support Blosc2).

You may instead just import b2h5py and explicitly enable the optimization globally by calling b2h5py.enable_fast_slicing(), and disable it again with b2h5py.disable_fast_slicing(). You may also enable it temporarily by using a context manager:

with b2h5py.fast_slicing():
    # ... code that will use Blosc2 optimized slicing ...

Finally, you may explicitly enable optimizations for a given h5py dataset by wrapping it in a B2Dataset instance:

b2dset = b2h5py.B2Dataset(dset)
# ... slicing ``b2dset`` will use Blosc2 optimization ...

Please note that, for the moment, plain iteration in B2Dataset instances is not optimized (as it falls back to plain Dataset slicing). This does not affect the other approaches further above. Instead of for row in b2dset: loops, you may prefer to use slicing like:

for i in range(len(b2dset)):
    # ... operate with ``b2dset[i]`` or ``b2dset[i, ...]`` ...

We recommend that you test which approach works better for your datasets. This limitation may be fixed in the future.

Building

Just install PyPA build (e.g. pip install build), enter the source code directory and run pyproject-build to get a source tarball and a wheel under the dist directory.

Installing

To install as a wheel from PyPI, run pip install b2h5py.

You may also install the wheel that you built in the previous section, or enter the source code directory and run pip install . from there.

Running tests

If you have installed b2h5py, just run python -m unittest discover b2h5py.tests.

Otherwise, just enter its source code directory and run python -m unittest.

You can also run the h5py tests with the patched Dataset class to check that patching does not break anything. You may install the h5py-test extra (e.g. pip install b2h5py[h5py-test] and run python -m b2h5py.tests.test_patched_h5py.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

b2h5py-0.4.0.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

b2h5py-0.4.0-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file b2h5py-0.4.0.tar.gz.

File metadata

  • Download URL: b2h5py-0.4.0.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.9.6 requests/2.28.1 setuptools/63.2.0 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.10.7

File hashes

Hashes for b2h5py-0.4.0.tar.gz
Algorithm Hash digest
SHA256 c0943b22e8132f680b3fb682186473ce502779635ce3dd73e9cd617d84f68c2a
MD5 b1c5d13baf4064e9faf61610a83afdc5
BLAKE2b-256 865e5ed90857cd91735c16a53359ecb03f49611404909327068be7547bf8c59f

See more details on using hashes here.

File details

Details for the file b2h5py-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: b2h5py-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.9.6 requests/2.28.1 setuptools/63.2.0 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.10.7

File hashes

Hashes for b2h5py-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 32714e490b0a576e38a8ad0296bde942c0b8a84a26ed6682400d300d3cf08c55
MD5 0938c81f63110f228b569448ddcb7e1d
BLAKE2b-256 f1b278a82a52dee405061d03922632d201b47a1e379696a4b7afe1d3a6d54155

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page