Skip to main content

Python wrapper for the C-Blosc2 library

Project description

A Python wrapper for the extremely fast Blosc2 compression library

Author:

The Blosc development team

Contact:

blosc@blosc.org

Github:

https://github.com/Blosc/python-blosc2

Actions:

actions

PyPi:

version

NumFOCUS:

numfocus

Code of Conduct:

Contributor Covenant

What it is

C-Blosc2 is the new major version of C-Blosc, and is backward compatible with both the C-Blosc1 API and its in-memory format. Python-Blosc2 is a Python package that wraps C-Blosc2, the newest version of the Blosc compressor.

Starting with version 3.0.0, Python-Blosc2 is including a powerful computing engine that can operate on compressed data that can be either in-memory, on-disk or on the network. This engine also supports advanced features like reductions, filters, user-defined functions and broadcasting (still in beta). You can read our tutorial on how to use this new feature at: https://github.com/Blosc/python-blosc2/blob/main/doc/getting_started/tutorials/03.lazyarray-expressions.ipynb and https://github.com/Blosc/python-blosc2/blob/main/doc/getting_started/tutorials/03.lazyarray-udf.ipynb

In addition, Python-Blosc2 aims to leverage the full C-Blosc2 functionality to support super-chunks (SChunk), multi-dimensional arrays (NDArray), metadata, serialization and other bells and whistles introduced in C-Blosc2.

Note: Blosc2 is meant to be backward compatible with Blosc(1) data. That means that it can read data generated with Blosc, but the opposite is not true (i.e. there is no forward compatibility).

NDArray: an N-Dimensional store

One of the more useful abstractions in Python-Blosc2 is the NDArray object. It can write and read n-dimensional datasets in an extremely efficient way thanks to a n-dimensional 2-level partitioning, allowing to slice and dice arbitrary large and compressed data in a more fine-grained way:

https://github.com/Blosc/python-blosc2/blob/main/images/b2nd-2level-parts.png?raw=true

To wet you appetite, here it is how the NDArray object performs on getting slices orthogonal to the different axis of a 4-dim dataset:

https://github.com/Blosc/python-blosc2/blob/main/images/Read-Partial-Slices-B2ND.png?raw=true

We have blogged about this: https://www.blosc.org/posts/blosc2-ndim-intro

We also have a ~2 min explanatory video on why slicing in a pineapple-style (aka double partition) is useful:

Slicing a dataset in pineapple-style

Operating with NDArrays

The NDArray objects can be operated with very easily inside Python-Blosc2. Here it is a simple example:

import numpy as np
import blosc2

N = 10_000
na = np.linspace(0, 1, N * N, dtype=np.float32).reshape(N, N)
nb = np.linspace(1, 2, N * N).reshape(N, N)
nc = np.linspace(-10, 10, N * N).reshape(N, N)

# Convert to blosc2
a = blosc2.asarray(na)
b = blosc2.asarray(nb)
c = blosc2.asarray(nc)

# Expression
expr = ((a ** 3 + blosc2.sin(c * 2)) < b) & (c > 0)

# Evaluate and get a NDArray as result
out = expr.eval()
print(out.info)

As you can see, the NDArray instances are very similar to NumPy arrays, but behind the scenes it holds compressed data that can be operated in a very efficient way with the new computing engine that is included in Python-Blosc2.

So as to whet your appetite, here it is the performance (with a MacBook Air M2 with 24 GB of RAM) that you can reach when the operands fit comfortably in-memory:

Performance when operands fit in-memory

In this case, performance is a bit far from top-level libraries like Numexpr or Numba, but it is still pretty nice (and probably using CPUs with more cores than M2 would allow closing the performance gap even further). One important thing to know is that the memory consumption when using the LazyArray.eval() method is very low, because the output is an NDArray object that is compressed and in-memory by default. On its hand LazyArray.__getitem__() method returns an actual NumPy array, so it is not recommended to use it for large datasets, as it will consume quite a bit of memory (but it can still be convenient for small outputs).

It is important to note that the NDArray object can use memory-mapped files as well, and the benchmark above is actually using a memory-mapped file as the storage for the operands. Memory-mapped files are very useful when the operands do not fit in-memory, while keeping good performance.

And here it is the performance when the operands do not fit well in-memory:

Performance when operands do not fit in-memory

In the latter case, the memory consumption lines look a bit crazy, but this is because what is displayed is the real memory consumption, not the virtual one (so, during the evaluation the OS has to swap out some memory to disk). In this case, the performance when compared with top-level libraries like Numexpr or Numba is very competitive.

You can find the benchmark for the above examples at: https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/lazyarray-expr.ipynb

Installing

Blosc is now offering Python wheels for the main OS (Win, Mac and Linux) and platforms. You can install binary packages from PyPi using pip:

pip install blosc2

Documentation

The documentation is here:

https://blosc.org/python-blosc2/python-blosc2.html

Also, some examples are available on:

https://github.com/Blosc/python-blosc2/tree/main/examples

Building from sources

python-blosc2 comes with the C-Blosc2 sources with it and can be built in-place:

git clone https://github.com/Blosc/python-blosc2/
cd python-blosc2
git submodule update --init --recursive
python -m pip install -r requirements-build.txt
python setup.py build_ext --inplace

That’s all. You can proceed with testing section now.

Testing

After compiling, you can quickly check that the package is sane by running the tests:

python -m pip install -r requirements-tests.txt
python -m pytest  (add -v for verbose mode)

Benchmarking

If curious, you may want to run a small benchmark that compares a plain NumPy array copy against compression through different compressors in your Blosc build:

PYTHONPATH=. python bench/pack_compress.py

License

The software is licenses under a 3-Clause BSD license. A copy of the python-blosc2 license can be found in LICENSE.txt.

Mailing list

Discussion about this module is welcome in the Blosc list:

blosc@googlegroups.com

https://groups.google.es/group/blosc

Mastodon

Please follow @Blosc2 to get informed about the latest developments. We lately moved from Twitter to Mastodon.

Citing Blosc

You can cite our work on the different libraries under the Blosc umbrella as:

@ONLINE{blosc,
  author = {{Blosc Development Team}},
  title = "{A fast, compressed and persistent data store library}",
  year = {2009-2024},
  note = {https://blosc.org}
}

Make compression better!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blosc2-3.0.0b1.tar.gz (6.6 MB view details)

Uploaded Source

Built Distributions

blosc2-3.0.0b1-cp312-cp312-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.12 Windows x86-64

blosc2-3.0.0b1-cp312-cp312-musllinux_1_2_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.12 musllinux: musl 1.2+ x86-64

blosc2-3.0.0b1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (4.7 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.28+ x86-64

blosc2-3.0.0b1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl (4.3 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64 manylinux: glibc 2.28+ ARM64

blosc2-3.0.0b1-cp312-cp312-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

blosc2-3.0.0b1-cp312-cp312-macosx_10_9_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.12 macOS 10.9+ x86-64

blosc2-3.0.0b1-cp311-cp311-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.11 Windows x86-64

blosc2-3.0.0b1-cp311-cp311-musllinux_1_2_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.11 musllinux: musl 1.2+ x86-64

blosc2-3.0.0b1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (4.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.28+ x86-64

blosc2-3.0.0b1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl (4.3 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64 manylinux: glibc 2.28+ ARM64

blosc2-3.0.0b1-cp311-cp311-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

blosc2-3.0.0b1-cp311-cp311-macosx_10_9_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

blosc2-3.0.0b1-cp310-cp310-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.10 Windows x86-64

blosc2-3.0.0b1-cp310-cp310-musllinux_1_2_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.10 musllinux: musl 1.2+ x86-64

blosc2-3.0.0b1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (4.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.28+ x86-64

blosc2-3.0.0b1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl (4.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64 manylinux: glibc 2.28+ ARM64

blosc2-3.0.0b1-cp310-cp310-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

blosc2-3.0.0b1-cp310-cp310-macosx_10_9_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

File details

Details for the file blosc2-3.0.0b1.tar.gz.

File metadata

  • Download URL: blosc2-3.0.0b1.tar.gz
  • Upload date:
  • Size: 6.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for blosc2-3.0.0b1.tar.gz
Algorithm Hash digest
SHA256 cd6995e45855b988f74426e64e6a3b94fa41c816e0fd784ea311fcf4f6fdff1b
MD5 01a774a7b42db3f42f6b063fd00829f8
BLAKE2b-256 9cfdd32710c8d2e2c3548f9e22199a52d6081ff8dc5cb1f0b1859a6758baee08

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 305f0fade1213ea577a812c5850d4fa937b8afe6346f8954dad3f75c85015af3
MD5 5b3e032d6b16e6a51c82c3c453bf7e25
BLAKE2b-256 a31befe646f9dd9466f82e88c6300e24ff7e25d9308dbc5eed68d96d207b7ae9

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 b74d3beee70d5994e0864766f256e54cbfd708bceb4a76e9d0b0e3e910f4e8df
MD5 2394108bd5520923806997bb8074ae7c
BLAKE2b-256 6105e9b7d91e49c135e64da1ef1c7ad5ba5edf58c1d3f2386390c6f2a64d8ce9

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 895426c662099408178f358d4dd5c63ff9802f1848f8e4401b01363bb7d83277
MD5 cce6add563baac11a4927e84172a338a
BLAKE2b-256 78fe02cb073360140d96cb102e1cd9e70fea18f578e2bd48475fcb2a3cd6faa0

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 259bb5ecc6204c74e89a7995b9f29994100b6155c0f972e769c0bf28c48a6be0
MD5 398cf91826f99e7a7eb84c15cbb24ccd
BLAKE2b-256 d4b1881973b0d573bcc37878f3a294da85ed9d0cb8870fb4b471fe081d85281e

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 260e539708e889699e628f77374352a196bbc493bb9c70f9337f5607984bb8b5
MD5 49da74436453592a10fdb40bfd39a868
BLAKE2b-256 0220eca2832024c9c8c185ad87dafafec4d2dbef9e94c4de241d0fb0985d3b2d

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ae3fe071e75414140a3c27ae6c9b636873d348df0d8538a86ee12d2d872915ec
MD5 64a82dd3b73d8dae7be7b0ad3adc3cb6
BLAKE2b-256 42a01b550968c50c4ccabf4de0730fe3a76c05a177231b6f645167b9d56c6d02

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 e672ca28cfa1defecb1a4b9a999af69b5a87e4436e131905f107e8cb8a9143b1
MD5 24951cc133dfe50f30e6997cf3d3a05d
BLAKE2b-256 21c64f52969566dfe83dbcc603c4ba0883f1d4e377ee1c9a63701ca2e9bf6117

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 1fc68c320d914796b307eec5bdcf6b98e8a5fb75dee206320b703c73e3519ea8
MD5 123fd7d93a39bbdf9051af6edd6734b1
BLAKE2b-256 0b88a5ece4ceb5153b64cd03da0e244c88f1413f9fb3bdc8744ab4b287f7f06f

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b63f5f9b41e74de7b638397fd9ff43ba630e89f3263ee3bf461a68b424f354da
MD5 f928700e02a84f9aebdc0eeb33fa1740
BLAKE2b-256 572e5a8c3b91aa4d5a36061249237a22a6e9a9a3fa281180ae84fdb71704b8f6

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f973ce6a5441ffc49934af4217a5628ed521362ef704641c46bb35910c3d04d3
MD5 1f16cc8c4cf78a501c16ebb6fae9ea2a
BLAKE2b-256 51866cfa4051f07c72dad1e93fd6470388d4f2c685f42128b522b6bb23f8acf2

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 98b0e5ee67565857a7975920e35161668415e9d38bfe105eac494c76bb7fbfae
MD5 0301a4e1eeb733451b050abec7db5d04
BLAKE2b-256 c6a64e5b039e6403d2671d86dec48fd2a7d494068e0786fc414c4ab26f03e1d4

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 2b03541205c40dc1bf7d27bcc3411fe36970bb7dff7c58819c9c1e8e71e7962f
MD5 b684cba5af1c412c99039531da505612
BLAKE2b-256 32e953fcc763f0a9975b99e3aaad8760ac1d2be38d5851a6489b180445e5bff9

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 5fbcef17ac42b37396e6b6a20d8dc3ab10f53777833d7dd62c7d7551ff79887e
MD5 a34c792a347c8c7d20a3e27e91ae7c40
BLAKE2b-256 52ad3df34602c4f5d4d6095a24dd7f69d601619804177e70f7d14345429fbb78

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 4664583b2f60a7553903e1d6786dee9a3da3eac166bb6a1098b446ab0bc32872
MD5 bc662d44d1dd55bc19c40e18b4a506fa
BLAKE2b-256 f177ef1cdc3c483422a9c1ca6377093c7547d6dc9b5b5280b2c7dcf9302dc187

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 02bbf94472c899af6b5d1d9cdc2031d34fc2a2e52f1e8d984cab529f1eebca9e
MD5 1faa15bca3b83db46c6fc36c1718a650
BLAKE2b-256 baa9b3d1abb04d7a176ae8e7ded93323dd6db2877cc07012221f70e163783e2b

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 804fe42d9cd7947f5a662535c69f82180174ab123252712e72b920a613a1982b
MD5 5ad496865829c0d0c09432e15112347d
BLAKE2b-256 d844f5f70f0371657399968765e5d3d7c15f4840d11f291277873c4b0ecad145

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 581d8b9b59dc76d353dbd12a79e63ba2086dc9904144bd251cbd83ed2eff0fec
MD5 81a868f7bd493d26bc5f9b2d63390309
BLAKE2b-256 c81c3052db0e40632197dbfb3f959fc05a3ef839ddb746f7bf13666ed185edd4

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b1-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 2646426fbd658ce73ab23e08b60c229657768ad4a7f3960893aa610469bbc139
MD5 a08a3f91d1212c6e69a337a609017a7e
BLAKE2b-256 793c5ba01803cc5d15ac9e662a95313d4b618789f636a4416a533effab027e9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page