Transparent optimized reading of n-dimensional Blosc2 slices for h5py
Project description
b2h5py provides h5py with transparent, automatic optimized reading of n-dimensional slices of Blosc2-compressed datasets. This optimized slicing leverages direct chunk access (skipping the slow HDF5 filter pipeline) and 2-level partitioning into chunks and then smaller blocks (so that less data is actually decompressed).
Benchmarks of this technique show 2x-5x speed-ups compared with normal filter-based access. Comparable results are obtained with a similar technique in PyTables, see Optimized Hyper-slicing in PyTables with Blosc2 NDim.
Usage
This optimized access works for slices with step 1 on Blosc2-compressed datasets using the native byte order. It is enabled by monkey-patching the h5py.Dataset class to extend the slicing operation. This is done on module import, so the only thing you need to do is:
import b2h5py
After that, optimization will be attempted for any slicing of a dataset (of the form dataset[...] or dataset.__getitem__(...)). If the optimization is not possible in a particular case, normal h5py slicing code will be used (which performs HDF5 filter-based access, backed by hdf5plugin to support Blosc2).
Even if the module is imported and the Dataset class is patched, you may still force-disable the optimization by setting BLOSC2_FILTER=1 in the environment.
Building
Just install PyPA build (e.g. pip install build), enter the source code directory and run pyproject-build to get a source tarball and a wheel under the dist directory.
Installing
Either install the wheel from the previous section, or enter the source code directory and run pip install . from there. There are no published wheels yet.
Running tests
If you have installed b2h5py, just run python -m unittest discover b2h5py.tests.
Otherwise, just enter its source code directory and run python -m unittest.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file b2h5py-0.1.0.tar.gz
.
File metadata
- Download URL: b2h5py-0.1.0.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.9.6 requests/2.28.1 setuptools/63.2.0 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bfbd831868b1b267ffa57afaaf2cfe343068086ba9261ac78ba3307933ea56e5 |
|
MD5 | 2a06a641275156924c4997837411f879 |
|
BLAKE2b-256 | cf0250661eb3cc87fbffbc4ad1f0355f24f4b411b41023d0dc5bbb3cb8ed2d2c |
File details
Details for the file b2h5py-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: b2h5py-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.9.6 requests/2.28.1 setuptools/63.2.0 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f1bd9ea031a4bb8423f12b542ed0350f2b17a6da619b474c8538b07646e2441 |
|
MD5 | 10d53886fe3efeecf91bf32e6845aaa9 |
|
BLAKE2b-256 | 2ead723510b3a40f91d6813e7ad06ea76f8ac56f46d7749031bc6560258fcbce |