Skip to main content

Python wrapper for the C-Blosc2 library

Project description

A Python wrapper for the extremely fast Blosc2 compression library

Author:

The Blosc development team

Contact:

blosc@blosc.org

Github:

https://github.com/Blosc/python-blosc2

Actions:

actions

PyPi:

version

NumFOCUS:

numfocus

Code of Conduct:

Contributor Covenant

What it is

C-Blosc2 is the new major version of C-Blosc, and is backward compatible with both the C-Blosc1 API and its in-memory format. Python-Blosc2 is a Python package that wraps C-Blosc2, the newest version of the Blosc compressor.

Starting with version 3.0.0, Python-Blosc2 is including a powerful computing engine that can operate on compressed data that can be either in-memory, on-disk or on the network. This engine also supports advanced features like reductions, filters, user-defined functions and broadcasting (still in beta).

You can read some of our tutorials on how to perform advanced computations at:

In addition, Python-Blosc2 aims to leverage the full C-Blosc2 functionality to support super-chunks (SChunk), multi-dimensional arrays (NDArray), metadata, serialization and other bells and whistles introduced in C-Blosc2.

Note: Blosc2 is meant to be backward compatible with Blosc(1) data. That means that it can read data generated with Blosc, but the opposite is not true (i.e. there is no forward compatibility).

NDArray: an N-Dimensional store

One of the more useful abstractions in Python-Blosc2 is the NDArray object. It can write and read n-dimensional datasets in an extremely efficient way thanks to a n-dimensional 2-level partitioning, allowing to slice and dice arbitrary large and compressed data in a more fine-grained way:

https://github.com/Blosc/python-blosc2/blob/main/images/b2nd-2level-parts.png?raw=true

To wet you appetite, here it is how the NDArray object performs on getting slices orthogonal to the different axis of a 4-dim dataset:

https://github.com/Blosc/python-blosc2/blob/main/images/Read-Partial-Slices-B2ND.png?raw=true

We have blogged about this: https://www.blosc.org/posts/blosc2-ndim-intro

We also have a ~2 min explanatory video on why slicing in a pineapple-style (aka double partition) is useful:

Slicing a dataset in pineapple-style

Operating with NDArrays

The NDArray objects can be operated with very easily inside Python-Blosc2. Here it is a simple example:

import numpy as np
import blosc2

N = 10_000
na = np.linspace(0, 1, N * N, dtype=np.float32).reshape(N, N)
nb = np.linspace(1, 2, N * N).reshape(N, N)
nc = np.linspace(-10, 10, N * N).reshape(N, N)

# Convert to blosc2
a = blosc2.asarray(na)
b = blosc2.asarray(nb)
c = blosc2.asarray(nc)

# Expression
expr = ((a ** 3 + blosc2.sin(c * 2)) < b) & (c > 0)

# Evaluate and get a NDArray as result
out = expr.eval()
print(out.info)

As you can see, the NDArray instances are very similar to NumPy arrays, but behind the scenes it holds compressed data that can be operated in a very efficient way with the new computing engine that is included in Python-Blosc2.

So as to whet your appetite, here it is the performance (with a MacBook Air M2 with 24 GB of RAM) that you can reach when the operands fit comfortably in-memory:

Performance when operands fit in-memory

In this case, performance is a bit far from top-level libraries like Numexpr or Numba, but it is still pretty nice (and probably using CPUs with more cores than M2 would allow closing the performance gap even further). One important thing to know is that the memory consumption when using the LazyArray.eval() method is very low, because the output is an NDArray object that is compressed and in-memory by default. On its hand LazyArray.__getitem__() method returns an actual NumPy array, so it is not recommended to use it for large datasets, as it will consume quite a bit of memory (but it can still be convenient for small outputs).

It is important to note that the NDArray object can use memory-mapped files as well, and the benchmark above is actually using a memory-mapped file as the storage for the operands. Memory-mapped files are very useful when the operands do not fit in-memory, while keeping good performance.

And here it is the performance when the operands do not fit well in-memory:

Performance when operands do not fit in-memory

In the latter case, the memory consumption lines look a bit crazy, but this is because what is displayed is the real memory consumption, not the virtual one (so, during the evaluation the OS has to swap out some memory to disk). In this case, the performance when compared with top-level libraries like Numexpr or Numba is very competitive.

You can find the benchmark for the above examples at: https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/lazyarray-expr.ipynb

Installing

Blosc2 is now offering Python wheels for the main OS (Win, Mac and Linux) and platforms. You can install binary packages from PyPi using pip:

pip install blosc2

We are in the process of releasing 3.0.0, and we will be releasing wheels for different beta versions. For example, to install the first beta version, you can do:

pip install blosc2==3.0.0b1

Documentation

The documentation is here:

https://blosc.org/python-blosc2/python-blosc2.html

Also, some examples are available on:

https://github.com/Blosc/python-blosc2/tree/main/examples

Building from sources

python-blosc2 comes with the C-Blosc2 sources with it and can be built in-place:

git clone https://github.com/Blosc/python-blosc2/
cd python-blosc2
pip install .   # add -e for editable mode

That’s all. You can proceed with testing section now.

Testing

After compiling, you can quickly check that the package is sane by running the tests:

pip install .[test]
pytest  (add -v for verbose mode)

Benchmarking

If curious, you may want to run a small benchmark that compares a plain NumPy array copy against compression through different compressors in your Blosc build:

python bench/pack_compress.py

License

The software is licenses under a 3-Clause BSD license. A copy of the python-blosc2 license can be found in LICENSE.txt.

Mailing list

Discussion about this module is welcome in the Blosc list:

blosc@googlegroups.com

https://groups.google.es/group/blosc

Mastodon

Please follow @Blosc2 to get informed about the latest developments. We lately moved from Twitter to Mastodon.

Citing Blosc

You can cite our work on the different libraries under the Blosc umbrella as:

@ONLINE{blosc,
  author = {{Blosc Development Team}},
  title = "{A fast, compressed and persistent data store library}",
  year = {2009-2024},
  note = {https://blosc.org}
}

Make compression better!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

blosc2-3.0.0b3-cp312-cp312-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.12 Windows x86-64

blosc2-3.0.0b3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

blosc2-3.0.0b3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.2 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

blosc2-3.0.0b3-cp312-cp312-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

blosc2-3.0.0b3-cp312-cp312-macosx_10_9_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.12 macOS 10.9+ x86-64

blosc2-3.0.0b3-cp311-cp311-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.11 Windows x86-64

blosc2-3.0.0b3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

blosc2-3.0.0b3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.2 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

blosc2-3.0.0b3-cp311-cp311-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

blosc2-3.0.0b3-cp311-cp311-macosx_10_9_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

blosc2-3.0.0b3-cp310-cp310-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.10 Windows x86-64

blosc2-3.0.0b3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

blosc2-3.0.0b3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

blosc2-3.0.0b3-cp310-cp310-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

blosc2-3.0.0b3-cp310-cp310-macosx_10_9_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

File details

Details for the file blosc2-3.0.0b3-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c4577f707e7856139f9d7529a54eedb16743d1b94a8398b610a97852aaf9fc46
MD5 0270afb1df4b997bc64cb58fd56dc9f7
BLAKE2b-256 249655ecdc6120ca11f7907bebe37233b7da1307b55291123acd88fae0080020

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2799e9c10d939000540d9a08f580bc9c75589f0c7036c3fb3de0020d70941cbd
MD5 6cf09b8e89b7d7f8895d7de3034e7853
BLAKE2b-256 dcee806fa2402557d8ec2e193a86b21c5588f33341ed183ada3448245b52a9f7

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0376c6e5abf2f2de7e9d9b4e325572ad9bf58c10faa8dbaa17b1f4190e4cd349
MD5 b21a655f84efe3f44544b68a4ba9270d
BLAKE2b-256 7369e78830d0c5b603a5de51e691b27108e538bbfa0098ea25dbb2d2fe1d1a17

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1ac549c1482d98b331546ce42d899a8a325b6144dade05b34afe139534ae3951
MD5 d2d09eeccbe91853083df89148c75ef4
BLAKE2b-256 3d2da7a93beec20bdd125e89c123e8545769596ddec6f156b52b2e32fa7a6564

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ce0c2c04f4ecb6477394a5f797c06654546027ccd41ae5cdb4592882bc1c7aa7
MD5 513b94a2779fd62e17cbc7514618bbe5
BLAKE2b-256 b966afcff91dc41a5cde78603f7475491e057f1a889d2676c0d9fd530aa2493f

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 e0c53f7b78cf1ac404470410649228b8a1486031919c7a75bdbbd2df795e5362
MD5 84a42c64a33d1e04952bb788c77bc619
BLAKE2b-256 b0c6a1b5743a592b3348f9340fa8452e7c8f878702c9f8981b7b88fb3248d542

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7563be4a7e521a124e35b42de2c6f5aeb52d61ab5fc9e681a59613e6a042085d
MD5 c0050a0ac215c891446640b47c39809f
BLAKE2b-256 5e75a77a4a7bca517431139046efa213de784536dce9f4c3e5b9727e384361fe

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 26d778cf63222ecb3a36c0b5ca0d0e5d4cb00211ca59a3502a9fe50af02874e6
MD5 afb3e8febf754ded53c24d4df32f9a06
BLAKE2b-256 b5d08943c5a9d8a950c5cb329d2d7a31149d98dbad7dd3df5038a100b644a9b3

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6069ebdef01ef3ad1bd17078ad10f43988b9523cef295aa0d5afee634d4df4ae
MD5 71c7d25df4b41c8f131d4537c666a76b
BLAKE2b-256 9e3b333be1ac95edeb125749e5a38fea5e9e5970f1f5080fba2879be4a8180cc

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 502a9ec3c884680adaa1559d757d7b67451b0aafad64ea555ac7f3eecd711a12
MD5 1e44b5ac22a932d187a0b50d871cac1e
BLAKE2b-256 c7b8f0d1883ff189fec3348f13904b7392c8ef5dbe8f9ef1d13214f6e967e4db

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 e335b3573534060495bf87436569b0761c77b5f2c88c5f3f75e291b4d8554b25
MD5 08b32b45a11ec77e617fa42bdd050942
BLAKE2b-256 9eae4b879614bceb95b7f87a265c8220e4ea93084b670cc7a75e8a13de46241d

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 458d40de339d605af3876bed324f153dca737c7790d272e4c1f283680d560924
MD5 12e6f8b56111f26b1bec80e96a32e4ce
BLAKE2b-256 4312c76a17051fc37d398b87a1c26a2f24b110447685bc9b6659e221e608beeb

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ee0726262994367b6371ded4e67b8a262ccedff32e26980c03497a6d6dc49865
MD5 806541aee257e39bb20840fde8140bc0
BLAKE2b-256 cd3d038e7fcc6bce4e80001535b7be7be686586529b22c83d748a4efbe69f9ea

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 42bce03f0cb1084d07dac9d7406eadc393d9c9376731e01af9f76380d146d330
MD5 b9643b5458cdf4284684ae05daa7ae1d
BLAKE2b-256 87b25e4eeb777c52cc28b0861b32480e741bcfa15ba064ac037929eb524263f3

See more details on using hashes here.

File details

Details for the file blosc2-3.0.0b3-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for blosc2-3.0.0b3-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 da7e2a31941d524b9f13e5dd15440fd227b3a3e8b70587fd8aec24ed5cc07354
MD5 806ddb528331d1a87ae7b763cea47880
BLAKE2b-256 85c8004b6282f3ad0c7630feedc10f13c2395fa96e3484916e9f44290db831de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page