Skip to main content

A Python library for drift detection in Machine Learning problems

Project description

frouros_logo


ci coverage documentation downloads pypi python bsd_3_license

Frouros is a Python library for drift detection in machine learning systems that provides a combination of classical and more recent algorithms for both concept and data drift detection.

"Everything changes and nothing stands still"

"You could not step twice into the same river"

Heraclitus of Ephesus (535-475 BCE.)


⚡️ Quickstart

Concept drift

As a quick example, we can use the wine dataset to which concept drift it is induced in order to show the use of a concept drift detector like DDM (Drift Detection Method).

import numpy as np
from sklearn.datasets import load_wine
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from frouros.detectors.concept_drift import DDM, DDMConfig

np.random.seed(seed=31)

# Load wine dataset
X, y = load_wine(return_X_y=True)

# Split train (70%) and test (30%)
(
    X_train,
    X_test,
    y_train,
    y_test,
) = train_test_split(X, y, train_size=0.7, random_state=31)

# IMPORTANT: Induce/simulate concept drift in the last part (20%)
# of y_test by modifying some labels (50% approx). Therefore, changing P(y|X))
drift_size = int(y_test.shape[0] * 0.2)
y_test_drift = y_test[-drift_size:]
modify_idx = np.random.rand(*y_test_drift.shape) <= 0.5
y_test_drift[modify_idx] = (y_test_drift[modify_idx] + 1) % len(np.unique(y_test))
y_test[-drift_size:] = y_test_drift

# Define and fit model
pipeline = Pipeline(
    [
        ("scaler", StandardScaler()),
        ("model", LogisticRegression()),
    ]
)
pipeline.fit(X=X_train, y=y_train)

# Detector configuration and instantiation
config = DDMConfig(warning_level=2.0,
                   drift_level=3.0,
                   min_num_instances=30,)
detector = DDM(config=config)

# Simulate data stream (assuming test label available after prediction)
for i, (X, y) in enumerate(zip(X_test, y_test)):
    y_pred = pipeline.predict(X.reshape(1, -1))
    error = 1 - int(y_pred == y)
    detector.update(value=error)
    status = detector.status
    if status["drift"]:
        print(f"Drift detected at index {i}")
        break

>> Drift detected at index 44

More concept drift examples can be found here.

Data drift

As a quick example, we can use the iris dataset to which data drift in order to show the use of a data drift detector like Kolmogorov-Smirnov test.

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

from frouros.detectors.data_drift import KSTest

np.random.seed(seed=31)

# Load iris dataset
X, y = load_iris(return_X_y=True)

# Split train (70%) and test (30%)
(
    X_train,
    X_test,
    y_train,
    y_test,
) = train_test_split(X, y, train_size=0.7, random_state=31)

# Set the feature index to which detector is applied
dim_idx = 0

# IMPORTANT: Induce/simulate data drift in the selected feature of y_test by
# applying some gaussian noise. Therefore, changing P(X))
X_test[:, dim_idx] += np.random.normal(
    loc=0.0,
    scale=3.0,
    size=X_test.shape[0],
)

# Define and fit model
model = DecisionTreeClassifier(random_state=31)
model.fit(X=X_train, y=y_train)

# Set significance level for hypothesis testing
alpha = 0.001
# Define and fit detector
detector = KSTest()
detector.fit(X=X_train[:, dim_idx])

# Apply detector to the selected feature of X_test
result = detector.compare(X=X_test[:, dim_idx])

# Check if drift is taking place
result[0].p_value < alpha
>> True # Data drift detected.
# Therefore, we can reject H0 (both samples come from the same distribution).

More data drift examples can be found here.

🛠 Installation

Frouros can be installed via pip:

pip install frouros

🕵🏻‍♂️️ Drift detection methods

The currently implemented detectors are listed in the following table.

Drift detector Type Family Univariate (U) / Multivariate (M) Numerical (N) / Categorical (C) Method Reference
Concept drift Streaming CUMSUM U N CUMSUM Page (1954)
U N Geometric moving average Roberts (1959)
U N Page Hinkley Page (1954)
Statistical process control U N DDM Gama et al. (2004)
U N ECDD-WT Ross et al. (2012)
U N EDDM Baena-Garcıa et al. (2006)
U N HDDM-A Frias-Blanco et al. (2014)
U N HDDM-W Frias-Blanco et al. (2014)
U N RDDM Barros et al. (2017)
Window based U N ADWIN Bifet and Gavalda (2007)
U N KSWIN Raab et al. (2020)
U N STEPD Nishida and Yamauchi (2007)
Data drift Batch Distance based U N Bhattacharyya distance Bhattacharyya (1946)
U N Earth Mover's distance Rubner et al. (2000)
U N Hellinger distance Hellinger (1909)
U N Histogram intersection normalized complement Swain and Ballard (1991)
U N Jensen-Shannon distance Lin (1991)
U N Kullback-Leibler divergence Kullback and Leibler (1951)
M N MMD Gretton et al. (2012)
U N PSI Wu and Olson (2010)
Statistical test U C Chi-square test Pearson (1900)
U N Cramér-von Mises test Cramér (1902)
U N Kolmogorov-Smirnov test Massey Jr (1951)
U N Welch's T-Test Welch (1947)
Streaming Distance based M N MMD Gretton et al. (2012)
Statistical test U N Incremental Kolmogorov-Smirnov test dos Reis et al. (2016)

✅ Who is using Frouros?

Frouros is actively being used by the following projects to implement drift detection in machine learning pipelines:

If you want your project listed here, do not hesitate to send us a pull request.

👍 Contributing

Check out the contribution section.

💬 Citation

Although Frouros paper is still in preprint, if you want to cite it you can use the preprint version (to be replaced by the paper once is published).

@article{cespedes2022frouros,
  title={Frouros: A Python library for drift detection in Machine Learning problems},
  author={C{\'e}spedes Sisniega, Jaime and L{\'o}pez Garc{\'\i}a, {\'A}lvaro },
  journal={arXiv preprint arXiv:2208.06868},
  year={2022}
}

📝 License

Frouros is an open-source software licensed under the BSD-3-Clause license.

🙏 Acknowledgements

Frouros has received funding from the Agencia Estatal de Investigación, Unidad de Excelencia María de Maeztu, ref. MDM-2017-0765.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

frouros-0.3.0.tar.gz (64.7 kB view details)

Uploaded Source

Built Distribution

frouros-0.3.0-py3-none-any.whl (101.5 kB view details)

Uploaded Python 3

File details

Details for the file frouros-0.3.0.tar.gz.

File metadata

  • Download URL: frouros-0.3.0.tar.gz
  • Upload date:
  • Size: 64.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for frouros-0.3.0.tar.gz
Algorithm Hash digest
SHA256 7209ee640aea894f5e543a1ebbb80645fcba31d970f0bd34f59104d6e1110ab9
MD5 1a9968f1b7487c77e7370b85ccc91d46
BLAKE2b-256 a4c674f326eb9ef421cb3f7d12cf529a8a2f11a2362312ca718de39278dfcc11

See more details on using hashes here.

File details

Details for the file frouros-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: frouros-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 101.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for frouros-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0104751dab8d76a4a613020522255a04428a2c673118a6181b82c8f8043b3503
MD5 58ae39c5662d5dc6be1be48f9a98165e
BLAKE2b-256 9e20ef4d6bea1811d82fd998af2aa40a0c7404541929adf5b73f9be986d78436

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page