Scalable machine learning based time series forecasting

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Project description

mlforecast

Scalable machine learning based time series forecasting.

Install

PyPI

pip install mlforecast

Optional dependencies

If you want more functionality you can instead use pip install mlforecast[extra1,extra2,...]. The current extra dependencies are:

aws: adds the functionality to use S3 as the storage in the CLI.
cli: includes the validations necessary to use the CLI.
distributed: installs dask to perform distributed training. Note that you'll also need to install either LightGBM or XGBoost.

For example, if you want to perform distributed training through the CLI using S3 as your storage you'll need all three extras, which you can get using: pip install mlforecast[aws,cli,distributed].

conda-forge

conda install -c conda-forge mlforecast

Note that this installation comes with the required dependencies for the local interface. If you want to:

Use s3 as storage: conda install -c conda-forge s3path
Perform distributed training: conda install -c conda-forge dask and either LightGBM or XGBoost.

How to use

The following provides a very basic overview, for a more detailed description see the documentation.

Programmatic API

Store your time series in a pandas dataframe with an index named unique_id that identifies each time serie, a column ds that contains the datestamps and a column y with the values.

from mlforecast.utils import generate_daily_series

series = generate_daily_series(20)
display_df(series.head())

unique_id	ds	y
id_00	2000-01-01 00:00:00	0.264447
id_00	2000-01-02 00:00:00	1.28402
id_00	2000-01-03 00:00:00	2.4628
id_00	2000-01-04 00:00:00	3.03552
id_00	2000-01-05 00:00:00	4.04356

Then create a TimeSeries object with the features that you want to use. These include lags, transformations on the lags and date features. The lag transformations are defined as numba jitted functions that transform an array, if they have additional arguments you supply a tuple (transform_func, arg1, arg2, ...).

from mlforecast.core import TimeSeries
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean

ts = TimeSeries(
    lags=[7, 14],
    lag_transforms={
        1: [expanding_mean],
        7: [(rolling_mean, 7), (rolling_mean, 14)]
    },
    date_features=['dayofweek', 'month']
)
ts

TimeSeries(freq=<Day>, transforms=['lag-7', 'lag-14', 'expanding_mean_lag-1', 'rolling_mean_lag-7_window_size-7', 'rolling_mean_lag-7_window_size-14'], date_features=['dayofweek', 'month'], num_threads=8)

Next define a model. If you want to use the local interface this can be any regressor that follows the scikit-learn API. For distributed training there are LGBMForecast and XGBForecast.

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(random_state=0)

Now instantiate your forecast object with the model and the time series. There are two types of forecasters, Forecast which is local and DistributedForecast which performs the whole process in a distributed way.

from mlforecast.forecast import Forecast

fcst = Forecast(model, ts)

To compute the features and train the model using them call .fit on your Forecast object.

fcst.fit(series)

Forecast(model=RandomForestRegressor(random_state=0), ts=TimeSeries(freq=<Day>, transforms=['lag-7', 'lag-14', 'expanding_mean_lag-1', 'rolling_mean_lag-7_window_size-7', 'rolling_mean_lag-7_window_size-14'], date_features=['dayofweek', 'month'], num_threads=8))

To get the forecasts for the next 14 days call .predict(14) on the forecaster. This will update the target with each prediction and recompute the features to get the next one.

predictions = fcst.predict(14)

display_df(predictions.head())

unique_id	ds	y_pred
id_00	2000-08-10 00:00:00	5.24484
id_00	2000-08-11 00:00:00	6.25861
id_00	2000-08-12 00:00:00	0.225484
id_00	2000-08-13 00:00:00	1.22896
id_00	2000-08-14 00:00:00	2.30246

CLI

If you're looking for computing quick baselines, want to avoid some boilerplate or just like using CLIs better then you can use the mlforecast binary with a configuration file like the following:

!cat sample_configs/local.yaml

data:
  prefix: data
  input: train
  output: outputs
  format: parquet
features:
  freq: D
  lags: [7, 14]
  lag_transforms:
    1: 
    - expanding_mean
    7: 
    - rolling_mean:
        window_size: 7
    - rolling_mean:
        window_size: 14
  date_features: ["dayofweek", "month", "year"]
  num_threads: 2
backtest:
  n_windows: 2
  window_size: 7
forecast:
  horizon: 7
local:
  model:
    name: sklearn.ensemble.RandomForestRegressor
    params:
      n_estimators: 10
      max_depth: 7

The configuration is validated using FlowConfig.

This configuration will use the data in data.prefix/data.input to train and write the results to data.prefix/data.output both with data.format.

data_path = Path('data')
data_path.mkdir()
series.to_parquet(data_path/'train')

!mlforecast sample_configs/local.yaml

Split 1 MSE: 0.0251
Split 2 MSE: 0.0180

list((data_path/'outputs').iterdir())

[PosixPath('data/outputs/valid_1.parquet'),
 PosixPath('data/outputs/valid_0.parquet'),
 PosixPath('data/outputs/forecast.parquet')]

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.14.0

Nov 11, 2024

0.13.6

Nov 8, 2024

0.13.5

Oct 10, 2024

0.13.4

Aug 23, 2024

0.13.3

Jul 25, 2024

0.13.2

Jul 17, 2024

0.13.1

Jul 1, 2024

0.13.0

May 9, 2024

0.12.1

Apr 8, 2024

0.12.0

Mar 4, 2024

0.11.8

Feb 16, 2024

0.11.7

Feb 15, 2024

0.11.6

Jan 19, 2024

0.11.5

Jan 8, 2024

0.11.4

Jan 2, 2024

0.11.3

Dec 14, 2023

0.11.2

Dec 7, 2023

0.11.1

Nov 24, 2023

0.11.0

Nov 6, 2023

0.10.0

Oct 3, 2023

0.9.3

Sep 12, 2023

0.9.2

Aug 29, 2023

0.9.1

Aug 15, 2023

0.9.0

Aug 1, 2023

0.8.1

Jul 21, 2023

0.8.0

Jul 20, 2023

0.7.4

Jul 5, 2023

0.7.3

May 23, 2023

0.7.2

May 16, 2023

0.7.1

Apr 27, 2023

0.7.0

Apr 11, 2023

0.6.0

Feb 3, 2023

0.5.0

Jan 31, 2023

0.4.0

Nov 25, 2022

0.3.1

Nov 9, 2022

0.3.0

Nov 1, 2022

0.2.0

Aug 10, 2022

This version

0.1.0

Jun 24, 2021

0.0.9

Jun 9, 2021

0.0.8

May 31, 2021

0.0.7

May 31, 2021

0.0.6

May 8, 2021

0.0.5

May 7, 2021

0.0.4.1

May 4, 2021

0.0.4

May 3, 2021

0.0.3

Apr 30, 2021

0.0.2

Apr 27, 2021

0.0.1

Apr 27, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlforecast-0.1.0.tar.gz (26.9 kB view hashes)

Uploaded Jun 24, 2021 Source

Built Distribution

mlforecast-0.1.0-py3-none-any.whl (26.9 kB view hashes)

Uploaded Jun 24, 2021 Python 3

Hashes for mlforecast-0.1.0.tar.gz

Hashes for mlforecast-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6d1ee8b29a1bf526694a042b59527ca5c31adf262089d19803ea45f53132ed6f`
MD5	`05175706806088ba19192aef4533c48e`
BLAKE2b-256	`d2fd5dfbfe8d732dd088ba0e72e982604a3989abc6e299b84a9b28a4688f1c54`

Hashes for mlforecast-0.1.0-py3-none-any.whl

Hashes for mlforecast-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5f416df4e49095e81239466ae5f422493b86c5b89a425c105326a0bebc8d3f8e`
MD5	`8d58338b697cdf3c391d5d789ddb1ca9`
BLAKE2b-256	`196e1810c00e0fea9c43d476d3838dba3f7d85ed5bc632ce99c2abd426e8259d`