Skip to main content

A collection of curated climate data sets

Project description

Bookshelf

bookshelf is one way Climate Resource reuses datasets across projects.

Key info : Docs Main branch: supported Python versions Licence

PyPI : PyPI PyPI install

Tests : CI Coverage

Other info : Last Commit Contributors

The bookshelf represents a shared collection of curated datasets or Books. Each Book is a preprocessed, versioned dataset including the notebooks used to produce it. As the underlying datasets or processing are updated, new Books can be created (with an updated version in the case of new data or edition if the processing changed). A single dataset may produce multiple Resources if different representations are useful. These Books can be deployed to a shared Bookshelfso that they are accessible by other users.

Users are able to use specific Books within other projects. The dataset and associated metadata is fetched and cached locally. Specific versions of Books can also be pinned for reproducibility purposes.

This repository contains the notebooks that are used to generate the Books as well as a CLI tool for managing these datasets.

This is a prototype and will likely change in future. Other potential ideas:

  • Deployed data are made available via api.climateresource.com.au so that they can be consumed queried smartly
  • Simple web page to allow querying the data

Each Book consists of a datapackage description of the metadata. This datapackage contains the associated Resources and their hashes. Each Resource is fetched when it is first used and then cached for later use.

Full documentation can be found at: https://climate-resource.github.io/bookshelf. We recommend reading the docs there because the internal documentation links don't render correctly on GitHub's viewer.

Getting Started

For data consumers

bookshelf can be installed via pip:

pip install bookshelf

Fetching and using Books requires very little setup in order to start playing with data.

>> import bookshelf
>> shelf = bookshelf.BookShelf()
# Load the latest version of the MAGICC specific rcmip emissions
>> book = shelf.load("rcmip-emissions")
INFO:/home/user/.cache/bookshelf/v0.1.0/rcmip-emissions/volume.json downloaded from https://cr-prod-datasets-bookshelf.s3.us-west-2.amazonaws.com/v0.1.0/rcmip-emissions/volume.json
# On the first call this will fetch the data from the server and cache locally
>> book.timeseries("magicc")
INFO:/home/user/.cache/bookshelf/v0.1.0/rcmip-emissions/v0.0.2/magicc.csv downloaded from https://cr-prod-datasets-bookshelf.s3.us-west-2.amazonaws.com/v0.1.0/rcmip-emissions/v0.0.2/magicc.csv
<ScmRun (timeseries: 1683, timepoints: 751)>
Time:
        Start: 1750-01-01T00:00:00
        End: 2500-01-01T00:00:00
Meta:
                 activity_id mip_era        model region          scenario       unit                    variable
        0     not_applicable   CMIP5          AIM  World             rcp60   Mt BC/yr                Emissions|BC
        1     not_applicable   CMIP5          AIM  World             rcp60  Mt CH4/yr               Emissions|CH4
        2     not_applicable   CMIP5          AIM  World             rcp60   Mt CO/yr                Emissions|CO
        3     not_applicable   CMIP5          AIM  World             rcp60  Mt CO2/yr               Emissions|CO2
        4     not_applicable   CMIP5          AIM  World             rcp60  Mt CO2/yr  Emissions|CO2|MAGICC AFOLU
        ...              ...     ...          ...    ...               ...        ...                         ...
        1678  not_applicable   CMIP5  unspecified  World  historical-cmip5  Mt NH3/yr               Emissions|NH3
        1679  not_applicable   CMIP5  unspecified  World  historical-cmip5  Mt NOx/yr               Emissions|NOx
        1680  not_applicable   CMIP5  unspecified  World  historical-cmip5   Mt OC/yr                Emissions|OC
        1681  not_applicable   CMIP5  unspecified  World  historical-cmip5  Mt SO2/yr            Emissions|Sulfur
        1682  not_applicable   CMIP5  unspecified  World  historical-cmip5  Mt VOC/yr               Emissions|VOC

        [1683 rows x 7 columns]

# Subsequent calls use the result from the cache
>> book.timeseries("magicc")

For data curators

If you wish to build/modify Books some additional dependencies are required. These can be installed using:

pip install bookshelf-producer

Building and deploying datasets is managed via Jupyter notebooks and a small yaml file that contains metadata about the dataset. These notebooks are stored as plain text Python files using the jupytext plugin for Jupyter. See notebooks/example.py for an example dataset.

Once the dataset has been developed, it can be deployed to the remote BookShelf so that other users can consume it.

The dataset can deployed using the publish CLI as shown below:

bookshelf publish my-dataset

/// admonition | Note Publishing to the remote bookshelf requires valid credentials. Creating or obtaining these credentials is not covered in this documentation. ///

For developers

For development, we rely on uv for all our dependency management. To get started, you will need to make sure that uv is installed (instructions here).

This project is a uv workspace, which means that it contains more than one Python package. uv commands will by default target the root bookshelf package, but if you wish to target another package you can use the --package flag.

For all of work, we use our Makefile. You can read the instructions out and run the commands by hand if you wish, but we generally discourage this because it can be error prone. In order to create your environment, run make virtual-environment.

If there are any issues, the messages from the Makefile should guide you through. If not, please raise an issue in the issue tracker.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bookshelf-0.3.1b4.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

bookshelf-0.3.1b4-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file bookshelf-0.3.1b4.tar.gz.

File metadata

  • Download URL: bookshelf-0.3.1b4.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.4.20

File hashes

Hashes for bookshelf-0.3.1b4.tar.gz
Algorithm Hash digest
SHA256 a4db62cdfc23d7289cb49ebd04a409fb5951b7ceab3f55018c9dda5a15d687fe
MD5 2f3580ef46c0da8882ea05f5952c1386
BLAKE2b-256 7883e7087914d40688294a56a733ea909dd7e65d82faec70c0e4b6000dd4b95f

See more details on using hashes here.

File details

Details for the file bookshelf-0.3.1b4-py3-none-any.whl.

File metadata

File hashes

Hashes for bookshelf-0.3.1b4-py3-none-any.whl
Algorithm Hash digest
SHA256 60fe7d11ef6645c26d5fe4eb7b023876050dbd579d7dbd128d0ad586b3144f8a
MD5 3551fea3dba964989a002a0d6620feb4
BLAKE2b-256 c4bc72adbcab2b440d2b2ed6e04b421a1c05437531d57292af5195f882918727

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page