Scalable machine learning based time series forecasting
Project description
mlforecast
Install
PyPI
pip install mlforecast
If you want to perform distributed training, you can instead use
pip install mlforecast[distributed]
, which will also install
dask. Note that you’ll also need to install either
LightGBM
or
XGBoost.
conda-forge
conda install -c conda-forge mlforecast
Note that this installation comes with the required dependencies for the
local interface. If you want to perform distributed training, you must
install dask (conda install -c conda-forge dask
) and either
LightGBM
or
XGBoost.
How to use
The following provides a very basic overview, for a more detailed description see the documentation.
Store your time series in a pandas dataframe with an index named unique_id that identifies each time serie, a column ds that contains the datestamps and a column y with the values.
from mlforecast.utils import generate_daily_series
series = generate_daily_series(20)
series.head()
ds | y | |
---|---|---|
unique_id | ||
id_00 | 2000-01-01 | 0.264447 |
id_00 | 2000-01-02 | 1.284022 |
id_00 | 2000-01-03 | 2.462798 |
id_00 | 2000-01-04 | 3.035518 |
id_00 | 2000-01-05 | 4.043565 |
Then create a TimeSeries
object with the features that you want to
use. These include lags, transformations on the lags and date features.
The lag transformations are defined as numba
jitted functions that transform an array, if they have additional
arguments you supply a tuple (transform_func
, arg1
, arg2
, …).
from mlforecast.core import TimeSeries
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean
ts = TimeSeries(
lags=[7, 14],
lag_transforms={
1: [expanding_mean],
7: [(rolling_mean, 7), (rolling_mean, 14)]
},
date_features=['dayofweek', 'month']
)
ts
TimeSeries(freq=<Day>, transforms=['lag-7', 'lag-14', 'expanding_mean_lag-1', 'rolling_mean_lag-7_window_size-7', 'rolling_mean_lag-7_window_size-14'], date_features=['dayofweek', 'month'], num_threads=1)
Next define a model. If you want to use the local interface this can be
any regressor that follows the scikit-learn API. For distributed
training there are LGBMForecast
and XGBForecast
.
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(random_state=0)
Now instantiate your forecast object with the model and the time series.
There are two types of forecasters, Forecast
which is local and
DistributedForecast
which performs the whole process in a distributed
way.
from mlforecast.forecast import Forecast
fcst = Forecast(model, ts)
To compute the features and train the model using them call .fit
on
your Forecast
object.
fcst.fit(series)
Forecast(model=RandomForestRegressor(random_state=0), ts=TimeSeries(freq=<Day>, transforms=['lag-7', 'lag-14', 'expanding_mean_lag-1', 'rolling_mean_lag-7_window_size-7', 'rolling_mean_lag-7_window_size-14'], date_features=['dayofweek', 'month'], num_threads=1))
To get the forecasts for the next 14 days call .predict(14)
on the
forecaster. This will automatically handle the updates required by the
features.
predictions = fcst.predict(14)
predictions.head()
ds | y_pred | |
---|---|---|
unique_id | ||
id_00 | 2000-08-10 | 5.244840 |
id_00 | 2000-08-11 | 6.258609 |
id_00 | 2000-08-12 | 0.225484 |
id_00 | 2000-08-13 | 1.228957 |
id_00 | 2000-08-14 | 2.302455 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mlforecast-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed6c7c6d8d0f9241ac8fbb565d51c1476fc8162278d2de634a289eb84163b366 |
|
MD5 | 14e47fb777ab298966e084eae4474869 |
|
BLAKE2b-256 | 1c386b6011acf5095eb70397658884737379c861f809f077364bcfb0254bdffc |