Skip to main content

Python wrapper for the C-Blosc2 library

Project description

A Python wrapper for the extremely fast Blosc2 compression library

Author:

The Blosc development team

Contact:

blosc@blosc.org

Github:

https://github.com/Blosc/python-blosc2

Actions:

actions

PyPi:

version

NumFOCUS:

numfocus

Code of Conduct:

Contributor Covenant

What it is

C-Blosc2 is the new major version of C-Blosc, and is backward compatible with both the C-Blosc1 API and its in-memory format. Python-Blosc2 is a Python package that wraps C-Blosc2, the newest version of the Blosc compressor.

Starting with version 3.0.0, Python-Blosc2 is including a powerful computing engine that can operate on compressed data that can be either in-memory, on-disk or on the network. This engine also supports advanced features like reductions, filters, user-defined functions and broadcasting (still in beta).

You can read some of our tutorials on how to perform advanced computations at:

In addition, Python-Blosc2 aims to leverage the full C-Blosc2 functionality to support super-chunks (SChunk), multi-dimensional arrays (NDArray), metadata, serialization and other bells and whistles introduced in C-Blosc2.

Note: Blosc2 is meant to be backward compatible with Blosc(1) data. That means that it can read data generated with Blosc, but the opposite is not true (i.e. there is no forward compatibility).

NDArray: an N-Dimensional store

One of the more useful abstractions in Python-Blosc2 is the NDArray object. It can write and read n-dimensional datasets in an extremely efficient way thanks to a n-dimensional 2-level partitioning, allowing to slice and dice arbitrary large and compressed data in a more fine-grained way:

https://github.com/Blosc/python-blosc2/blob/main/images/b2nd-2level-parts.png?raw=true

To wet you appetite, here it is how the NDArray object performs on getting slices orthogonal to the different axis of a 4-dim dataset:

https://github.com/Blosc/python-blosc2/blob/main/images/Read-Partial-Slices-B2ND.png?raw=true

We have blogged about this: https://www.blosc.org/posts/blosc2-ndim-intro

We also have a ~2 min explanatory video on why slicing in a pineapple-style (aka double partition) is useful:

Slicing a dataset in pineapple-style

Operating with NDArrays

The NDArray objects can be operated with very easily inside Python-Blosc2. Here it is a simple example:

import numpy as np
import blosc2

N = 10_000
na = np.linspace(0, 1, N * N, dtype=np.float32).reshape(N, N)
nb = np.linspace(1, 2, N * N).reshape(N, N)
nc = np.linspace(-10, 10, N * N).reshape(N, N)

# Convert to blosc2
a = blosc2.asarray(na)
b = blosc2.asarray(nb)
c = blosc2.asarray(nc)

# Expression
expr = ((a ** 3 + blosc2.sin(c * 2)) < b) & (c > 0)

# Evaluate and get a NDArray as result
out = expr.eval()
print(out.info)

As you can see, the NDArray instances are very similar to NumPy arrays, but behind the scenes it holds compressed data that can be operated in a very efficient way with the new computing engine that is included in Python-Blosc2.

So as to whet your appetite, here it is the performance (with a MacBook Air M2 with 24 GB of RAM) that you can reach when the operands fit comfortably in-memory:

Performance when operands fit in-memory

In this case, performance is a bit far from top-level libraries like Numexpr or Numba, but it is still pretty nice (and probably using CPUs with more cores than M2 would allow closing the performance gap even further). One important thing to know is that the memory consumption when using the LazyArray.eval() method is very low, because the output is an NDArray object that is compressed and in-memory by default. On its hand LazyArray.__getitem__() method returns an actual NumPy array, so it is not recommended to use it for large datasets, as it will consume quite a bit of memory (but it can still be convenient for small outputs).

It is important to note that the NDArray object can use memory-mapped files as well, and the benchmark above is actually using a memory-mapped file as the storage for the operands. Memory-mapped files are very useful when the operands do not fit in-memory, while keeping good performance.

And here it is the performance when the operands do not fit well in-memory:

Performance when operands do not fit in-memory

In the latter case, the memory consumption lines look a bit crazy, but this is because what is displayed is the real memory consumption, not the virtual one (so, during the evaluation the OS has to swap out some memory to disk). In this case, the performance when compared with top-level libraries like Numexpr or Numba is very competitive.

You can find the benchmark for the above examples at: https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/lazyarray-expr.ipynb

Installing

Blosc2 is now offering Python wheels for the main OS (Win, Mac and Linux) and platforms. You can install binary packages from PyPi using pip:

pip install blosc2

We are in the process of releasing 3.0.0, and we will be releasing wheels for different beta versions. For example, to install the first beta version, you can do:

pip install blosc2==3.0.0b1

Documentation

The documentation is here:

https://blosc.org/python-blosc2/python-blosc2.html

Also, some examples are available on:

https://github.com/Blosc/python-blosc2/tree/main/examples

Building from sources

python-blosc2 comes with the C-Blosc2 sources with it and can be built in-place:

git clone https://github.com/Blosc/python-blosc2/
cd python-blosc2
pip install .   # add -e for editable mode

That’s all. You can proceed with testing section now.

Testing

After compiling, you can quickly check that the package is sane by running the tests:

pip install .[test]
pytest  (add -v for verbose mode)

Benchmarking

If curious, you may want to run a small benchmark that compares a plain NumPy array copy against compression through different compressors in your Blosc build:

python bench/pack_compress.py

License

The software is licenses under a 3-Clause BSD license. A copy of the python-blosc2 license can be found in LICENSE.txt.

Mailing list

Discussion about this module is welcome in the Blosc list:

blosc@googlegroups.com

https://groups.google.es/group/blosc

Mastodon

Please follow @Blosc2 to get informed about the latest developments. We lately moved from Twitter to Mastodon.

Citing Blosc

You can cite our work on the different libraries under the Blosc umbrella as:

@ONLINE{blosc,
  author = {{Blosc Development Team}},
  title = "{A fast, compressed and persistent data store library}",
  year = {2009-2024},
  note = {https://blosc.org}
}

Make compression better!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

blosc2-3.0.0b4-cp312-cp312-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.12 Windows x86-64

blosc2-3.0.0b4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

blosc2-3.0.0b4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.2 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

blosc2-3.0.0b4-cp312-cp312-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

blosc2-3.0.0b4-cp312-cp312-macosx_10_13_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.12 macOS 10.13+ x86-64

blosc2-3.0.0b4-cp311-cp311-win_amd64.whl (2.2 MB view details)

Uploaded CPython 3.11 Windows x86-64

blosc2-3.0.0b4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

blosc2-3.0.0b4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.2 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

blosc2-3.0.0b4-cp311-cp311-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

blosc2-3.0.0b4-cp311-cp311-macosx_10_9_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

blosc2-3.0.0b4-cp310-cp310-win_amd64.whl (2.2 MB view details)

Uploaded CPython 3.10 Windows x86-64

blosc2-3.0.0b4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

blosc2-3.0.0b4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

blosc2-3.0.0b4-cp310-cp310-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

blosc2-3.0.0b4-cp310-cp310-macosx_10_9_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

File details

Details for the file blosc2-3.0.0b4-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 e4b4b1c33f8025c99a0887e9369161b65fa750668fc1886980fb248fb43613a4
MD5 ca12ae09ca9c7de60f26e2c792a102aa
BLAKE2b-256 5a4be04471776de08bab2c646ffe61ac10497ef7f273769c1f9ebd2df45291b1

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 51991dacb41c6ba5d5ab5eacd4156bfdf23cf30f275362b04996e20b22c4fdae
MD5 f57577f820a61f606c786e5fcc12990a
BLAKE2b-256 7717feeaabe404f301dabeef886cdbe3accfb80202ecafbf0125aa9416d944e3

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f5af05fa8d35cf21d7488da9ecc77d12799753d5d52442ebc4b9f99d053b1005
MD5 c2cc61ae2948d6918fd3409720bc61f6
BLAKE2b-256 77a9ffddf64f7ebed71a4527691fbbe5e8445818dde22efa9254d0a471c4bb14

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8fcf2fae16b57ad73fed8443ae163cf388db59fe6070b7021480a5c59aee25b3
MD5 c3fbb3183f278e858567d0fa70aeb2f9
BLAKE2b-256 c9ae1294622d888d9170056289a0a6880576ef956c8bcd3484d6ac384baf4e7e

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 811f605e6e70faa2147575f78d175920602e602d4844eb0d60a32f7ea721b048
MD5 00a01dc1a2bb8d9abf2fbd13f3970e1e
BLAKE2b-256 ca446295b502b0550c51ee92e0acf5a022e40d9482a5f764904783250574e2bc

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 76d3c34a6751e6e02a89ae6f97ec7925651ada933ec34e295008763fb4ac3186
MD5 210f2cc55e3d8b5beee2bf1c1f6264d3
BLAKE2b-256 af4370f19c7c398c06f87a89fd8856e2f33854829ca279a5ebb7c5c1c5afbc3f

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 63edf510207ca9d8e6a50b587097380186f74f869ca71d4d7fc988ef3af0e62e
MD5 cbcf09db42af3698be364290ed1d8648
BLAKE2b-256 4e3be8644a3891a03a8d897aee6097133d6e9df9b31d4037278ae0cc67c7b716

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 082fe60edd36d97539ce75d71db613b2b41178c26fd577e528cfca0336de71f0
MD5 6fd9df86ddbf06f2b53527d83e493ff4
BLAKE2b-256 036657e81ae9e96abe91a4f4cf4bb3fc97ddd5ef53d7a0d28291a59c826f0da6

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 872bdae91658dbd4d207bcb54fc05922aa917fb73865537ab4dfb28d15bbe9a0
MD5 686ddbad5e7e5707bd5d6b2d859befe2
BLAKE2b-256 cb270cedc03f4089af274cf9e020b501f202ef7f0c878509c914a2acc42e402e

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 fe4ee46081fb688b54f3d0439785c0f9c2563460afa859d7a96fa58874aa9f78
MD5 7647031ccc175e533f74d739e36a1ded
BLAKE2b-256 080beda327cd3aba0483562101a28c73d3324aa89ea4061b3ce5787f3328a601

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 9b62b3537c554544e7abf11407c3c2a3bf503db80d9e138b6957cb1b5728103a
MD5 52a778ab3a2fa43dc58b87d60cdd36fd
BLAKE2b-256 3bc3e7367f861a66afe19fa6b9af44e1d403412c39ad2bc2df14a46b6d78f608

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6f7f2118ae263f9467a1195efcf4f78a6022c43909d243caa8ee85bd1c02190a
MD5 6802ff8105678c0e7c9a09083aacb6ba
BLAKE2b-256 1f031a71890c17a91410755cc6c782ab2845100638b5e5437c8da2a4e7eef420

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 729b16fb1a3f382855ebc21449458abf0ce12e7a30b526a9b640d050aea9cddf
MD5 4cd6e7bd57e7a6228e134434427885dc
BLAKE2b-256 33e7e00917ea4613bab6f9ca91bc0dfe0062d2dfae9fbb45cf16511f79602f5d

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7024620c89a0b69f31d2abddcb5cc9a69cdf02b42087b1a8814c2b5780a52dca
MD5 794e5d483e37029d29d73861712455a6
BLAKE2b-256 92fadfbcb085f6826cec1bd63932f370a567e7cd8931387d835c4d346422d7ec

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b4-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b4-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 2255090be89417adb53ec6d8628445b94dd7b05df054fcfb5163b640ed87293a
MD5 875983d3d5ec2f1a8278a79a30b0ea97
BLAKE2b-256 4de684aa22f7cbcddcbcf2375199c78ecf42a46d649f56af954b65807b737e1d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page