Scalable machine learning based time series forecasting
Project description
mlforecast
Scalable machine learning based time series forecasting.
Install
pip install mlforecast
Optional dependencies
If you want more functionality you can instead use pip install mlforecast[extra1, extra2, ...]
. The current extra dependencies are:
- aws: adds the functionality to use S3 as the storage in the CLI.
- cli: includes the validations necessary to use the CLI.
- distributed: installs
dask
to perform distributed training. Note that you'll also need to install eitherlightgbm
orxgboost
.
For example, if you want to perform distributed training through the CLI using S3 as your storage you'll need all three extras, which you can get using: pip install mlforecast[aws, cli, distributed]
.
How to use
Programmatic API
Store your time series in a pandas dataframe with an index named unique_id that is the identifier of each serie, a column ds that contains the datestamps and a column y with the values.
from mlforecast.utils import generate_daily_series
series = generate_daily_series(20)
display_df(series.head())
unique_id | ds | y |
---|---|---|
id_00 | 2000-01-01 00:00:00 | 0.264447 |
id_00 | 2000-01-02 00:00:00 | 1.28402 |
id_00 | 2000-01-03 00:00:00 | 2.4628 |
id_00 | 2000-01-04 00:00:00 | 3.03552 |
id_00 | 2000-01-05 00:00:00 | 4.04356 |
Then you define your flow configuration. These include lags, transformations on the lags and date features. The transformations are defined as numba
jitted functions that transform an array. If they have additional arguments you supply a tuple (transform_func
, arg1
, arg2
, ...)
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean
flow_config = dict(
lags=[7, 14],
lag_transforms={
1: [expanding_mean],
7: [(rolling_mean, 7), (rolling_mean, 14)]
},
date_features=['dayofweek', 'month']
)
Next define a model, if you're on a single machine this can be any regressor that follows the scikit-learn API. For distributed training there are LGBMForecast
and XGBForecast
.
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
Now instantiate your forecast object with the model and the flow configuration. There are two types of forecasters, Forecast
and DistributedForecast
. Since this is a single machine example we'll use the first.
from mlforecast.forecast import Forecast
fcst = Forecast(model, flow_config)
To compute the transformations and train the model on the data you call .fit
on your Forecast
object.
fcst.fit(series)
Forecast(model=RandomForestRegressor(), flow_config={'lags': [7, 14], 'lag_transforms': {1: [CPUDispatcher(<function expanding_mean at 0x7fac9f73f280>)], 7: [(CPUDispatcher(<function rolling_mean at 0x7fac9f7a8f70>), 7), (CPUDispatcher(<function rolling_mean at 0x7fac9f7a8f70>), 14)]}, 'date_features': ['dayofweek', 'month']})
To get the forecasts for the next 14 days you just call .predict(14)
on the forecaster.
predictions = fcst.predict(14)
display_df(predictions.head())
unique_id | ds | y_pred |
---|---|---|
id_00 | 2000-08-10 00:00:00 | 5.2325 |
id_00 | 2000-08-11 00:00:00 | 6.26395 |
id_00 | 2000-08-12 00:00:00 | 0.196386 |
id_00 | 2000-08-13 00:00:00 | 1.25263 |
id_00 | 2000-08-14 00:00:00 | 2.2988 |
CLI
If you're looking for computing quick baselines, want to avoid some boilerplate or just like using CLIs better then you can use the mlforecast
binary with a configuration file like the following:
!cat sample_configs/local.yaml
data:
prefix: data
input: train
output: outputs
format: parquet
features:
freq: D
lags: [7, 14]
lag_transforms:
1:
- expanding_mean
7:
- rolling_mean:
window_size: 7
- rolling_min:
window_size: 7
date_features: ["dayofweek", "month", "year"]
num_threads: 2
backtest:
n_windows: 2
window_size: 7
forecast:
horizon: 7
local:
model:
name: sklearn.ensemble.RandomForestRegressor
params:
n_estimators: 10
max_depth: 7
This will use the data in prefix/input
and write the results to prefix/output
.
!mlforecast sample_configs/local.yaml
Split 1 MSE: 0.0240
Split 2 MSE: 0.0187
[0m
!ls data/outputs/
forecast.parquet valid_0.parquet valid_1.parquet
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mlforecast-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ea5681ff4fa2f8ebee0b8a01c5bb2a1583e015eda26150f51eb61fbe23f8e14 |
|
MD5 | e7e649ac13880b4ec0f0a7ebb49668fe |
|
BLAKE2b-256 | 1d759045adde5498b1e942240ba05c4f6beb6cedb4604d183345de372fdb2fff |