Skip to main content

An intake plugin for parsing an ESM (Earth System Model) Collection/catalog and loading assets (netCDF files and/or Zarr stores) into xarray datasets.

Project description

Intake-esm

Badges

CI GitHub Workflow Status GitHub Workflow Status Code Coverage Status
Docs Documentation Status
Package Conda PyPI
License License
Citation Zenodo

Motivation

Computer simulations of the Earth’s climate and weather generate huge amounts of data. These data are often persisted on HPC systems or in the cloud across multiple data assets of a variety of formats (netCDF, zarr, etc...). Finding, investigating, loading these data assets into compute-ready data containers costs time and effort. The data user needs to know what data sets are available, the attributes describing each data set, before loading a specific data set and analyzing it.

Finding, investigating, loading these assets into data array containers such as xarray can be a daunting task due to the large number of files a user may be interested in. Intake-esm aims to address these issues by providing necessary functionality for searching, discovering, data access/loading.

Overview

intake-esm is a data cataloging utility built on top of intake, pandas, and xarray, and it's pretty awesome!

  • Opening an ESM collection definition file: An ESM (Earth System Model) collection file is a JSON file that conforms to the ESM Collection Specification. When provided a link/path to an esm collection file, intake-esm establishes a link to a database (CSV file) that contains data assets locations and associated metadata (i.e., which experiment, model, the come from). The collection JSON file can be stored on a local filesystem or can be hosted on a remote server.

    In [1]: import intake
    
    In [2]: col_url = "https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json"
    
    In [3]: col = intake.open_esm_datastore(col_url)
    
    In [4]: col
    Out[4]: <pangeo-cmip6 catalog with 4287 dataset(s) from 282905 asset(s)>
    
  • Search and Discovery: intake-esm provides functionality to execute queries against the catalog:

    In [5]: col_subset = col.search(
       ...:     experiment_id=["historical", "ssp585"],
       ...:     table_id="Oyr",
       ...:     variable_id="o2",
       ...:     grid_label="gn",
       ...: )
    
    In [6]: col_subset
    Out[6]: <pangeo-cmip6 catalog with 18 dataset(s) from 138 asset(s)>
    
  • Access: when the user is satisfied with the results of their query, they can ask intake-esm to load data assets (netCDF/HDF files and/or Zarr stores) into xarray datasets:

      In [7]: dset_dict = col_subset.to_dataset_dict(zarr_kwargs={"consolidated": True})
    
      --> The keys in the returned dictionary of datasets are constructed as follows:
              'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
      |███████████████████████████████████████████████████████████████| 100.00% [18/18 00:10<00:00]
    

See documentation for more information.

Installation

Intake-esm can be installed from PyPI with pip:

python -m pip install intake-esm

It is also available from conda-forge for conda installations:

conda install -c conda-forge intake-esm

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intake-esm-2021.1.15.tar.gz (403.6 kB view details)

Uploaded Source

Built Distribution

intake_esm-2021.1.15-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file intake-esm-2021.1.15.tar.gz.

File metadata

  • Download URL: intake-esm-2021.1.15.tar.gz
  • Upload date:
  • Size: 403.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for intake-esm-2021.1.15.tar.gz
Algorithm Hash digest
SHA256 6cd1aaecc1df68f0a7b94ab405db347b9592354a7b29b037dc8935f74ee09a1b
MD5 f75ba22d468d13ac285f3bc99d1a5c0e
BLAKE2b-256 07a84d27562ddbd80f4d1a43384915bf302dffbb18be9698eb1a9701d3037082

See more details on using hashes here.

File details

Details for the file intake_esm-2021.1.15-py3-none-any.whl.

File metadata

  • Download URL: intake_esm-2021.1.15-py3-none-any.whl
  • Upload date:
  • Size: 26.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for intake_esm-2021.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 d62ae327855021497917d8d6b123cd86210f80943fadadecd9e57a6d4c8e087d
MD5 386c6f5ca78c152e428f02455a8a47b2
BLAKE2b-256 a2d601e340735883b2636781f07aae88bf56d484bf006400d074f5cbb95c7b10

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page