Skip to main content

python package for s2s forecasts with ai

Project description

Logo

s2spy

github repo badge github license badge fair-software badge build sonarcloud workflow scc badge

A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting.

Why s2spy?

Producing reliable sub-seasonal to seasonal (S2S) forecasts with machine learning techniques remains a challenge. Currently, these data-driven S2S forecasts generally suffer from a lack of trust because of:

  • Intransparent data processing and poorly reproducible scientific outcomes
  • Technical pitfalls related to machine learning-based predictability (e.g. overfitting)
  • Black-box methods without sufficient explanation

To tackle these challenges, we build s2spy which is an open-source, high-level python package. It provides an interface between artificial intelligence and expert knowledge, to boost predictability and physical understanding of S2S processes. By implementing optimal data-handling and parallel-computing packages, it can efficiently run across different Big Climate Data platforms. Key components will be explainable AI and causal discovery, which will support the classical scientific interplay between theory, hypothesis-generation and data-driven hypothesis-testing, enabling knowledge-mining from data.

Developing this tool will be a community effort. It helps us achieve trustworthy data-driven forecasts by providing:

  • Transparent and reproducible analyses
  • Best practices in model verifications
  • Understanding the sources of predictability

Installation

workflow pypi badge supported python versions

To install the latest release of s2spy, do:

python3 -m pip install s2spy

To install the in-development version from the GitHub repository, do:

python3 -m pip install git+https://github.com/AI4S2S/s2spy.git

Configure the package for development and testing

The testing framework used here is pytest. Before running the test, we get a local copy of the source code and install s2spy via the command:

git clone https://github.com/AI4S2S/s2spy.git
cd s2spy
python3 -m pip install -e .

Then, run tests:

python3 -m pytest

Getting started

s2spy provides end-to-end solutions for machine learning (ML) based S2S forecasting.

workflow

Datetime operations & Data processing

In a typical ML-based S2S project, the first step is always data processing. A calendar-based datetime module time is implemented for time operations. For instance, a user is looking for predictors for winter climate at seasonal timescales (~180 days). First, a calendar object is created using AdventCalendar:

calendar = s2spy.time.AdventCalendar(anchor=(11, 30), freq='180d')
calendar = calendar.map_years(2020, 2021)
calendar.show()
>>>    i_interval                 (target) 0                         1
>>>    anchor_year
>>>    2021         (2021-06-03, 2021-11-30]  (2020-12-05, 2021-06-03]
>>>    2020         (2020-06-03, 2020-11-30]  (2019-12-06, 2020-06-03]

Now, the user can load the data input_data (e.g. pandas DataFrame) and resample it to the desired timescales configured in the calendar:

calendar = calendar.map_to_data(input_data)
bins = s2spy.time.resample(calendar, input_data)
bins
>>>       anchor_year  i_interval                  interval  mean_data  target
>>>     0        2020           0  (2020-06-03, 2020-11-30]      275.5    True
>>>     1        2020           1  (2019-12-06, 2020-06-03]       95.5   False
>>>     2        2021           0  (2021-06-03, 2021-11-30]      640.5    True
>>>     3        2021           1  (2020-12-05, 2021-06-03]      460.5   False

Depending on data preparations, we can choose different types of calendars e.g. MonthlyCalendar and WeeklyCalendar.

Cross-validation

Using s2spy, we can generate train/test splits and perform cross-validation. To do that, a splitter is called from sklearn.model_selection e.g. ShuffleSplit and used to split the resampled data:

from sklearn.model_selection import ShuffleSplit
splitter = ShuffleSplit(n_splits=3)
s2spy.traintest.split_groups(splitter, bins)

All splitter classes from scikit-learn are supported, a list is available here. Users should follow scikit-learn documentation on how to use a different splitter class.

Dimensionality reduction

In s2spy, we can perform dimensionality reduction on data. For instance, to perform the Response Guided Dimensionality Reduction (RGDR), we configure the RGDR operator and fit it to a precursor field. Then, this cluster can be used to transform the data into the reduced clusters:

rgdr = RGDR(eps_km=600, alpha=0.05, min_area_km2=3000**2)
rgdr.fit(precursor_field, target_timeseries)
clustered_data = rgdr.transform(precursor_field)
_ = rgdr.plot_clusters(precursor_field, target_timeseries, lag=1)

clusters

(for more information about precursor_field and target_timeseries, check the complete example in this notebook.)

Currently, s2spy supports dimensionality reduction approaches from scikit-learn.

Train a model

More information will follow soon.

eXplainable AI (XAI) analysis

More information will follow soon.

Tutorials

s2spy supports operations that are common in a machine learning pipeline of sub-seasonal to seasonal forecasting research. Tutorials covering supported methods and functionalities are listed in notebooks. To check these notebooks, users need to install Jupyter lab. More details about each method can be found in this API reference documentation.

Documentation

Documentation Status

For detailed information on using s2spy package, visit the documentation page hosted at Readthedocs.

Contributing

If you want to contribute to the development of s2spy, have a look at the contribution guidelines.

How to cite us

RSD

More information will follow soon.

Credits

This package was created with Cookiecutter and the NLeSC/python-template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s2spy-0.2.1.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

s2spy-0.2.1-py3-none-any.whl (28.3 kB view details)

Uploaded Python 3

File details

Details for the file s2spy-0.2.1.tar.gz.

File metadata

  • Download URL: s2spy-0.2.1.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.5

File hashes

Hashes for s2spy-0.2.1.tar.gz
Algorithm Hash digest
SHA256 fb3c9c245b56e05204c61f0c555ec8b46698cbd29768e00ae7935a56ef6215a2
MD5 21a4df103c485d4db368040e22b27251
BLAKE2b-256 c0b9df2fcde3800319b5dd5061d4bb201ca787295618b8b6c156872e0889463b

See more details on using hashes here.

File details

Details for the file s2spy-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: s2spy-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 28.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.5

File hashes

Hashes for s2spy-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ab1231fac7d732441e0360aa3031a602ae039b149aed2f4a6134581c1ae3c227
MD5 dd62f6d41971de602979e39b7021669e
BLAKE2b-256 c93b6c4c8c5a9ebd3a85bd72a211d9b5ee9885b4e03265033d361f8485f3976d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page