Skip to main content

A scikit-learn based AutoML tool

Project description

skplumber

Build Status

A package for automatically sampling, training, and scoring machine learning pipelines on classification or regression problems. The base constructs (pipelines, primitives, etc.) take heavily from the Data Driven Discovery of Models (D3M) core package.

Getting Started

Installation

pip install skplumber

Usage

SKPlumber.crank

The top-level API of the package is the SKPlumber class. You instantiate the class, then use it's crank method to perform a search for an optimal machine learning (ML) pipeline, given your input data x, and y (a pandas.DataFrame and pandas.Series respectively). Here is an example using the classic iris dataset:

from skplumber import SKPlumber
import pandas as pd
from sklearn.datasets import load_iris

dataset = load_iris()
X = pd.DataFrame(data=dataset["data"], columns=dataset["feature_names"])
y = pd.Series(dataset["target"])

plumber = SKPlumber()
best_pipeline, best_score = plumber.crank(X, y, problem="classification")
print(f"The best cross validated score the model found was: {best_score}")

# To use the best pipeline on unseen data:
predictions = best_pipeline.predict(unseen_X)

Pipeline

The Pipeline class is a slightly lower level API for the package that can be used to build, fit, and predict arbitrarily shaped machine learning pipelines. For example, we can create a basic single level stacking pipeline, where the output from predictors are fed into another predictor to ensemble in a learned way:

from skplumber import Pipeline
from skplumber.primitives import transformers, classifiers
import pandas as pd
from sklearn.datasets import load_iris

dataset = load_iris()
X = pd.DataFrame(data=dataset["data"], columns=dataset["feature_names"])
y = pd.Series(dataset["target"])

# A random imputation of missing values step and one hot encoding of
# non-numeric features step are automatically added.
pipeline = Pipeline()
# Preprocess the inputs
pipeline.add_step(transformers["StandardScalerPrimitive"])
# Save the pipeline step index of the preprocessor's outputs
stack_input = pipeline.curr_step_i
# Add three classifiers to the pipeline that all take the
# preprocessor's outputs as inputs
stack_outputs = []
for clf_name in [
    "LinearDiscriminantAnalysisPrimitive",
    "DecisionTreeClassifierPrimitive",
    "KNeighborsClassifierPrimitive"
]:
    pipeline.add_step(classifiers[clf_name], [stack_input])
    stack_outputs.append(pipeline.curr_step_i)
# Add a final classifier that takes the outputs of all the previous
# three classifiers as inputs
pipeline.add_step(classifiers["RandomForestClassifierPrimitive"], stack_outputs)

# Train the pipeline
pipeline.fit(X, y)

# Have fitted pipeline make predictions
pipeline.predict(X)

Package Opinions

  • A pipeline's final step must be the step that produces the pipeline's final output.
  • All missing values are imputed.
  • All columns of type object and category are one hot encoded.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skplumber-0.4.5.dev0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

skplumber-0.4.5.dev0-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file skplumber-0.4.5.dev0.tar.gz.

File metadata

  • Download URL: skplumber-0.4.5.dev0.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.6.8

File hashes

Hashes for skplumber-0.4.5.dev0.tar.gz
Algorithm Hash digest
SHA256 109894f3926b6a3e44ba48e1522ca9c9519be53f217d3f19a979cc2f4438cc54
MD5 9c66e1a99109dfc31699dba9d3de9ea0
BLAKE2b-256 f5e5ed2f1d5db38c6a6a97cc9c67ae9b09b9224f772bedecd9294c32b38aac14

See more details on using hashes here.

File details

Details for the file skplumber-0.4.5.dev0-py3-none-any.whl.

File metadata

  • Download URL: skplumber-0.4.5.dev0-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.6.8

File hashes

Hashes for skplumber-0.4.5.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe522dc38e83d53dc57ba15ffcec5c0ff2f49633eaf0e0937eda00f0c2e3c19b
MD5 394e4fa74609d7bbecf45f9b78e91548
BLAKE2b-256 9660a08913dd37592c087b429f5003b4c8382eea332b6d9c8370b3f088183468

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page