zarr · PyPI

A minimal implementation of chunked, compressed, N-dimensional arrays for Python.

These details have been verified by PyPI

Owner

Zarr

Maintainers

aliman jakirkham rabernat zarr_dev

These details have not been verified by PyPI

Project links

Homepage

Project description

A minimal implementation of chunked, compressed, N-dimensional arrays for Python.

Source code: https://github.com/alimanfoo/zarr
Download: https://pypi-hypernode.com/pypi/zarr
Release notes: https://github.com/alimanfoo/zarr/releases

https://travis-ci.org/alimanfoo/zarr.svg?branch=master

Installation

Installation requires Numpy and Cython pre-installed. Can only be installed on Linux currently.

Install from PyPI:

$ pip install -U zarr

Install from GitHub:

$ pip install -U git+https://github.com/alimanfoo/zarr.git@master

Status

Experimental, proof-of-concept. This is alpha-quality software. Things may break, change or disappear without warning.

Bug reports and suggestions welcome.

Design goals

Chunking in multiple dimensions
Resize any dimension
Concurrent reads
Concurrent writes
Release the GIL during compression and decompression

Usage

Create an array:

>>> import numpy as np
>>> import zarr
>>> z = zarr.empty(shape=(10000, 1000), dtype='i4', chunks=(1000, 100))
>>> z
zarr.ext.SynchronizedArray((10000, 1000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 38.1M; cbytes: 0; initialized: 0/100

Fill it with some data:

>>> z[:] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z
zarr.ext.SynchronizedArray((10000, 1000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3; initialized: 100/100

Obtain a NumPy array by slicing:

>>> z[:]
array([[      0,       1,       2, ...,     997,     998,     999],
       [   1000,    1001,    1002, ...,    1997,    1998,    1999],
       [   2000,    2001,    2002, ...,    2997,    2998,    2999],
       ...,
       [9997000, 9997001, 9997002, ..., 9997997, 9997998, 9997999],
       [9998000, 9998001, 9998002, ..., 9998997, 9998998, 9998999],
       [9999000, 9999001, 9999002, ..., 9999997, 9999998, 9999999]], dtype=int32)
>>> z[:100]
array([[    0,     1,     2, ...,   997,   998,   999],
       [ 1000,  1001,  1002, ...,  1997,  1998,  1999],
       [ 2000,  2001,  2002, ...,  2997,  2998,  2999],
       ...,
       [97000, 97001, 97002, ..., 97997, 97998, 97999],
       [98000, 98001, 98002, ..., 98997, 98998, 98999],
       [99000, 99001, 99002, ..., 99997, 99998, 99999]], dtype=int32)
>>> z[:, :100]
array([[      0,       1,       2, ...,      97,      98,      99],
       [   1000,    1001,    1002, ...,    1097,    1098,    1099],
       [   2000,    2001,    2002, ...,    2097,    2098,    2099],
       ...,
       [9997000, 9997001, 9997002, ..., 9997097, 9997098, 9997099],
       [9998000, 9998001, 9998002, ..., 9998097, 9998098, 9998099],
       [9999000, 9999001, 9999002, ..., 9999097, 9999098, 9999099]], dtype=int32)

Resize the array and add more data:

>>> z.resize(20000, 1000)
>>> z
zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 76.3M; cbytes: 2.0M; ratio: 38.5; initialized: 100/200
>>> z[10000:, :] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z
zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 76.3M; cbytes: 4.0M; ratio: 19.3; initialized: 200/200

For convenience, an append() method is also available, which can be used to append data to any axis:

>>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z = zarr.array(a, chunks=(1000, 100))
>>> z.append(a+a)
>>> z
zarr.ext.SynchronizedArray((20000, 1000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 76.3M; cbytes: 3.6M; ratio: 21.2; initialized: 200/200
>>> z.append(np.vstack([a, a]), axis=1)
>>> z
zarr.ext.SynchronizedArray((20000, 2000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 152.6M; cbytes: 7.6M; ratio: 20.2; initialized: 400/400

Persistence

Create a persistent array (data stored on disk):

>>> path = 'example.zarr'
>>> z = zarr.open(path, mode='w', shape=(10000, 1000), dtype='i4', chunks=(1000, 100))
>>> z[:] = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z
zarr.ext.SynchronizedPersistentArray((10000, 1000), int32, chunks=(1000, 100))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 38.1M; cbytes: 2.0M; ratio: 19.3; initialized: 100/100
  mode: w; path: example.zarr

There is no need to close a persistent array. Data are automatically flushed to disk.

If you’re working with really big arrays, try the ‘lazy’ option:

>>> path = 'big.zarr'
>>> z = zarr.open(path, mode='w', shape=(1e8, 1e7), dtype='i4', chunks=(1000, 1000), lazy=True)
>>> z
zarr.ext.SynchronizedLazyPersistentArray((100000000, 10000000), int32, chunks=(1000, 1000))
  cname: blosclz; clevel: 5; shuffle: 1 (BYTESHUFFLE)
  nbytes: 3.6P; cbytes: 0; initialized: 0/1000000000
  mode: w; path: big.zarr

See the persistence documentation for more details of the file format.

Tuning

zarr is optimised for accessing and storing data in contiguous slices, of the same size or larger than chunks. It is not and probably never will be optimised for single item access.

Chunks sizes >= 1M are generally good. Optimal chunk shape will depend on the correlation structure in your data.

zarr is designed for use in parallel computations working chunk-wise over data. Try it with dask.array. If using in a multi-threaded, set zarr to use blosc in contextual mode:

>>> zarr.set_blosc_options(use_context=True)

Acknowledgments

zarr uses c-blosc internally for compression and decompression and borrows code heavily from bcolz.

Project details

These details have been verified by PyPI

Owner

Zarr

Maintainers

aliman jakirkham rabernat zarr_dev

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

3.0.0a4 pre-release

Sep 13, 2024

3.0.0a3 pre-release

Sep 6, 2024

3.0.0a2 pre-release

Aug 30, 2024

3.0.0a1 pre-release

Aug 23, 2024

3.0.0a0 pre-release

Jun 12, 2024

2.18.3

Sep 4, 2024

2.18.2

May 26, 2024

2.18.1

May 17, 2024

2.18.0

May 7, 2024

2.17.2

Apr 5, 2024

2.17.1

Mar 6, 2024

2.17.0

Feb 14, 2024

2.16.1

Aug 18, 2023

2.16.0

Jul 20, 2023

2.15.0

Jun 14, 2023

2.15.0a2 pre-release

May 25, 2023

2.15.0a1 pre-release

May 2, 2023

2.14.2

Feb 24, 2023

2.14.1

Feb 12, 2023

2.14.0

Feb 10, 2023

2.13.6

Jan 16, 2023

2.13.3

Oct 9, 2022

2.13.2

Sep 27, 2022

2.13.1

Sep 26, 2022

2.13.0

Sep 22, 2022

2.13.0a2 pre-release

Sep 8, 2022

2.13.0a1 pre-release

Aug 6, 2022

2.12.0

Jun 23, 2022

2.12.0a2 pre-release

May 23, 2022

2.12.0a1 pre-release

May 10, 2022

2.11.3

Apr 6, 2022

2.11.2

Apr 5, 2022

2.11.1

Mar 7, 2022

2.11.0

Feb 7, 2022

2.11.0a2 pre-release

Nov 4, 2021

2.11.0a1 pre-release

Oct 20, 2021

2.10.3

Nov 19, 2021

2.10.2

Oct 19, 2021

2.10.1

Sep 30, 2021

2.10.0

Sep 19, 2021

2.9.5

Sep 1, 2021

2.9.4

Aug 30, 2021

2.9.3

Aug 26, 2021

2.9.2

Aug 24, 2021

2.9.1

Aug 24, 2021

2.9.0

Aug 23, 2021

2.8.3

May 20, 2021

2.8.2

May 19, 2021

2.8.1

Apr 27, 2021

2.8.0

Apr 24, 2021

2.7.1

Apr 16, 2021

2.7.0

Mar 25, 2021

2.6.1

Dec 2, 2020

2.5.0

Oct 6, 2020

2.4.0

Jan 11, 2020

2.3.2

May 30, 2019

2.3.1

Mar 25, 2019

2.3.0

Mar 22, 2019

2.2.0

Mar 7, 2018

2.2.0rc3 pre-release

Jan 31, 2018

2.2.0rc2 pre-release

Jan 30, 2018

2.2.0rc1 pre-release

Jan 29, 2018

2.1.4

Jan 26, 2017

2.1.3

Sep 27, 2016

2.1.2

Sep 25, 2016

2.1.1

Sep 12, 2016

2.1.0

Sep 9, 2016

2.0.1

Sep 3, 2016

2.0.0

Sep 2, 2016

2.0.0a2 pre-release

Aug 31, 2016

1.1.0

Jul 22, 2016

1.0.0

May 17, 2016

1.0.0b6 pre-release

May 16, 2016

1.0.0b4 pre-release

May 8, 2016

1.0.0b3 pre-release

May 5, 2016

1.0.0b2 pre-release

May 5, 2016

1.0.0b2.dev1 pre-release

May 5, 2016

This version

0.4.0

Apr 14, 2016

0.3.0

Dec 21, 2015

0.2.7

Dec 18, 2015

0.2.6

Dec 18, 2015

0.2.5

Dec 18, 2015

0.2.3

Dec 18, 2015

0.2.2

Dec 18, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zarr-0.4.0.tar.gz (5.6 MB view hashes)

Uploaded Apr 14, 2016 Source

Hashes for zarr-0.4.0.tar.gz

Hashes for zarr-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`91b968cd43f1ecb7450d0718b19f26f83f76766fc79cf77ec2a05ec3744195ba`
MD5	`4e9dfdd25e6425a65ff0cca3d0b43720`
BLAKE2b-256	`145a1a148b9757f7e94e411ab0ee7808a623db96e1244b6905e9300c0f35350e`