Skip to main content

Apache Beam pipelines to make weather data accessible and useful.

Project description

weather-tools

Apache Beam pipelines to make weather data accessible and useful.

CI

Introduction

This project contributes a series of command-line tools to make common data engineering tasks easier for researchers in climate and weather. These solutions were born out of the need to improve repeated work performed by research teams across Alphabet.

The first tool created was the weather downloader (weather-dl). This makes it easier to ingest data from the European Center for Medium Range Forecasts (ECMWF). weather-dl enables users to describe very specifically what data they'd like to ingest from ECMWF's catalogs. It also offers them control over how to parallelize requests, empowering users to retrieve data efficiently. Downloads are driven from a configuration file, which can be reviewed (and version-controlled) independently of pipeline or analysis code.

We also provide two additional tools to aid climate and weather researchers: the weather mover (weather-mv) and the weather splitter (weather-sp). These CLIs are still in their alpha stages of development. Yet, they have been used for production workflows for several partner teams.

We created the weather mover (weather-mv) to load geospatial data from cloud buckets into Google BigQuery. This enables rapid exploratory analysis and visualization of weather data: From BigQuery, scientists can load arbitrary climate data fields into a Pandas or XArray dataframe via a simple SQL query.

The weather splitter (weather-sp) helps normalize how archival weather data is stored in cloud buckets: Whether you're trying to merge two datasets with overlapping variables — or, you simply need to open Grib data from XArray, it's really useful to split datasets into their component variables.

Installing

It's recommended that you create a local python environment (with Anaconda). Otherwise, these tools can be installed with pip:

pip install google-weather-tools

From here, you can use the weather-* tools from your python environment. Currently, the following tools are available:

Quickstart

Together, let's download Era 5 pressure level data and ingest it into Google BigQuery.

Pre-requisites:

  1. Acquire and install a license from ECMWF's Copernicus (CDS) API.
  2. Create an empty BigQuery Dataset. This can be done in the console or via the bq CLI. For example:
    bq mk --project_id=$PROJECT $DATASET_ID
    

Steps:

  1. Use weather-dl to acquire the Era 5 pressure level data.

    For simplicity, let's run everything on your local machine. For the downloader, this means we'll use the --local-run option:

    weather-dl configs/era5_example_config_local_run.cfg --local-run
    

    Recommendation: Pass the -d, --dry-run flag to any of these commands to preview effects.

    Generally, weather-dl is designed to ingest weather data to cloud storage. To learn how to configure downloads, please see this documentation. See detailed usage of weather-dl here.

  2. (optional) Split your downloaded dataset up by variable with weather-sp:

     weather-sp --input-pattern "./local_run/era5-*.nc" --output-dir "split_data" 
    

    Consult the weather-sp docs for more.

  3. Use weather-mv to upload this data to Google BigQuery.

    weather-mv --uris "./local_run/**.nc" \ # or  --uris "./split_data/**.nc" \
       --output_table "$PROJECT.$DATASET_ID.$TABLE_ID" \
       --temp_location "gs://$BUCKET/tmp" \  # Needed for batch writes to BigQuery
       --direct_num_workers 2
    

    See these docs for more about this tool.

    Warning: Dry-runs are currently not supported. See #22.

That's it! Soon, you'll have your weather data ready for analysis in BigQuery.

Note: The exact interfaces for these CLIs are subject to change. For example, we plan to make the CLIs have more uniform arguments (#21).

Contributing

The weather tools are under active development, and contributions are welcome! Please check out our guide to get started.

License

This is not an official Google product.

Copyright 2021 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google-weather-tools-0.2.2.tar.gz (61.2 kB view details)

Uploaded Source

Built Distribution

google_weather_tools-0.2.2-py3-none-any.whl (89.6 kB view details)

Uploaded Python 3

File details

Details for the file google-weather-tools-0.2.2.tar.gz.

File metadata

  • Download URL: google-weather-tools-0.2.2.tar.gz
  • Upload date:
  • Size: 61.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.12

File hashes

Hashes for google-weather-tools-0.2.2.tar.gz
Algorithm Hash digest
SHA256 03025746e6026840c57d0d10aa492f912bc9c4c37d32604b4ba7e6d76470c885
MD5 e889a50b5a7713c9bdbfcc8f9934f971
BLAKE2b-256 fa9fd81917c216f1bf9fe283075ccb1d4b0448a23ae33d7df0c85fae014e17d6

See more details on using hashes here.

File details

Details for the file google_weather_tools-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: google_weather_tools-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 89.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.12

File hashes

Hashes for google_weather_tools-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4c332c388711347bb4715bf697b86228b5c1422b6b684ab79abfafc2b57613bb
MD5 cce6e3c00265dc1c5825e58dd269f274
BLAKE2b-256 474113ce3da6c111dda0781bfc077d93a3e70fa5926e369d2e721ca63d4e0840

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page