Skip to main content

netCDF4 via h5py

Project description

https://github.com/h5netcdf/h5netcdf/workflows/CI/badge.svg https://badge.fury.io/py/h5netcdf.svg

A Python interface for the netCDF4 file-format that reads and writes local or remote HDF5 files directly via h5py or h5pyd, without relying on the Unidata netCDF library.

Why h5netcdf?

  • It has one less binary dependency (netCDF C). If you already have h5py installed, reading netCDF4 with h5netcdf may be much easier than installing netCDF4-Python.

  • We’ve seen occasional reports of better performance with h5py than netCDF4-python, though in many cases performance is identical. For one workflow, h5netcdf was reported to be almost 4x faster than netCDF4-python.

  • Anecdotally, HDF5 users seem to be unexcited about switching to netCDF – hopefully this will convince them that netCDF4 is actually quite sane!

  • Finally, side-stepping the netCDF C library (and Cython bindings to it) gives us an easier way to identify the source of performance issues and bugs in the netCDF libraries/specification.

Install

Ensure you have a recent version of h5py installed (I recommend using conda). At least version 2.1 is required (for dimension scales); versions 2.3 and newer have been verified to work, though some tests only pass on h5py 2.6. Then: pip install h5netcdf

Usage

h5netcdf has two APIs, a new API and a legacy API. Both interfaces currently reproduce most of the features of the netCDF interface, with the notable exception of support for operations the rename or delete existing objects. We simply haven’t gotten around to implementing this yet. Patches would be very welcome.

New API

The new API supports direct hierarchical access of variables and groups. Its design is an adaptation of h5py to the netCDF data model. For example:

import h5netcdf
import numpy as np

with h5netcdf.File('mydata.nc', 'w') as f:
    # set dimensions with a dictionary
    f.dimensions = {'x': 5}
    # and update them with a dict-like interface
    # f.dimensions['x'] = 5
    # f.dimensions.update({'x': 5})

    v = f.create_variable('hello', ('x',), float)
    v[:] = np.ones(5)

    # you don't need to create groups first
    # you also don't need to create dimensions first if you supply data
    # with the new variable
    v = f.create_variable('/grouped/data', ('y',), data=np.arange(10))

    # access and modify attributes with a dict-like interface
    v.attrs['foo'] = 'bar'

    # you can access variables and groups directly using a hierarchical
    # keys like h5py
    print(f['/grouped/data'])

    # add an unlimited dimension
    f.dimensions['z'] = None
    # explicitly resize a dimension and all variables using it
    f.resize_dimension('z', 3)

Legacy API

The legacy API is designed for compatibility with netCDF4-python. To use it, import h5netcdf.legacyapi:

import h5netcdf.legacyapi as netCDF4
# everything here would also work with this instead:
# import netCDF4
import numpy as np

with netCDF4.Dataset('mydata.nc', 'w') as ds:
    ds.createDimension('x', 5)
    v = ds.createVariable('hello', float, ('x',))
    v[:] = np.ones(5)

    g = ds.createGroup('grouped')
    g.createDimension('y', 10)
    g.createVariable('data', 'i8', ('y',))
    v = g['data']
    v[:] = np.arange(10)
    v.foo = 'bar'
    print(ds.groups['grouped'].variables['data'])

The legacy API is designed to be easy to try-out for netCDF4-python users, but it is not an exact match. Here is an incomplete list of functionality we don’t include:

  • Utility functions chartostring, num2date, etc., that are not directly necessary for writing netCDF files.

  • We don’t support the endian argument to createVariable yet (see GitHub issue).

  • h5netcdf variables do not support automatic masking or scaling (e.g., of values matching the _FillValue attribute). We prefer to leave this functionality to client libraries (e.g., xarray), which can implement their exact desired scaling behavior.

  • No support yet for automatic resizing of unlimited dimensions with array indexing. This would be a welcome pull request. For now, dimensions can be manually resized with Group.resize_dimension(dimension, size).

Invalid netCDF files

h5py implements some features that do not (yet) result in valid netCDF files:

  • Data types:
    • Booleans

    • Complex values

    • Non-string variable length types

    • Enum types

    • Reference types

  • Arbitrary filters:
    • Scale-offset filters

By default [*], h5netcdf will not allow writing files using any of these features, as files with such features are not readable by other netCDF tools.

However, these are still valid HDF5 files. If you don’t care about netCDF compatibility, you can use these features by setting invalid_netcdf=True when creating a file:

# avoid the .nc extension for non-netcdf files
f = h5netcdf.File('mydata.h5', invalid_netcdf=True)
...

# works with the legacy API, too, though compression options are not exposed
ds = h5netcdf.legacyapi.Dataset('mydata.h5', invalid_netcdf=True)
...

Decoding variable length strings

h5py 3.0 introduced new behavior for handling variable length string. Instead of being automatically decoded with UTF-8 into NumPy arrays of str, they are required as arrays of bytes.

The legacy API preserves the old behavior of h5py (which matches netCDF4), and automatically decodes strings.

The new API also currently preserves the old behavior of h5py, but issues a warning that it will change in the future to match h5py. Explicitly set decode_vlen_strings=False in the h5netcdf.File constructor to opt-in to the new behavior early, or set decode_vlen_strings=True to opt-in to automatic decoding.

Datasets with missing dimension scales

By default [] h5netcdf raises a ValueError if variables with no dimension scale associated with one of their axes are accessed. You can set phony_dims='sort' when opening a file to let h5netcdf invent phony dimensions according to netCDF behaviour.

# mimic netCDF-behaviour for non-netcdf files
f = h5netcdf.File('mydata.h5', mode='r', phony_dims='sort')
...

Note, that this iterates once over the whole group-hierarchy. This has affects on performance in case you rely on lazyness of group access. You can set phony_dims='access' instead to defer phony dimension creation to group access time. The created phony dimension naming will differ from netCDF behaviour.

f = h5netcdf.File('mydata.h5', mode='r', phony_dims='access')
...

Changelog

Changelog

License

3-clause BSD

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h5netcdf-0.12.0.tar.gz (26.1 kB view details)

Uploaded Source

Built Distribution

h5netcdf-0.12.0-py2.py3-none-any.whl (16.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file h5netcdf-0.12.0.tar.gz.

File metadata

  • Download URL: h5netcdf-0.12.0.tar.gz
  • Upload date:
  • Size: 26.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for h5netcdf-0.12.0.tar.gz
Algorithm Hash digest
SHA256 6d3623f9ea46747249333792f13b515d05ff652dcec1ee436fa969bee03a8dd5
MD5 06aceb1202840b636dc0b006e096ac6a
BLAKE2b-256 f0b771752e7713fdc096ad12263c0b78b77c06d8b121286555c61b1703ee629f

See more details on using hashes here.

Provenance

File details

Details for the file h5netcdf-0.12.0-py2.py3-none-any.whl.

File metadata

  • Download URL: h5netcdf-0.12.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for h5netcdf-0.12.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 83f27c9176108836d0641036bb240cfe7c2da0484c9c3d130032e903c46a6400
MD5 19e810c50197dff1da7d864e570f1fbd
BLAKE2b-256 faf9c408a4149ae5f8523aecc15f13af14eab934830085d63484cc254dae05f8

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page