A scikit-learn based AutoML tool
Project description
skplumber
A package for automatically sampling, training, and scoring machine learning pipelines on classification or regression problems. The base constructs (pipelines, primitives, etc.) take heavily from the Data Driven Discovery of Models (D3M) core package.
Getting Started
Installation
pip install skplumber
Usage
The SKPlumber
AutoML System
The top-level API of the package is the SKPlumber
class. You instantiate the class, then use it's fit
method to perform a search for an optimal machine learning (ML) pipeline, given your input data X
, and y
(a pandas.DataFrame
and pandas.Series
respectively). Here is an example using the classic iris dataset:
from skplumber import SKPlumber
import pandas as pd
from sklearn.datasets import load_iris
dataset = load_iris()
X = pd.DataFrame(data=dataset["data"], columns=dataset["feature_names"])
y = pd.Series(dataset["target"])
# Ask plumber to find the best machine learning pipeline it
# can for the problem in 60 seconds.
plumber = SKPlumber(problem="classification", budget=60)
plumber.fit(X, y)
# To use the best found machine learning pipeline on unseen data:
predictions = plumber.predict(unseen_X)
Pipeline
The Pipeline
class is a slightly lower level API for the package that can be used to build, fit, and predict arbitrarily shaped machine learning pipelines. For example, we can create a basic single level stacking pipeline, where the output from predictors are fed into another predictor to ensemble in a learned way:
from skplumber import Pipeline
from skplumber.primitives import transformers, classifiers
import pandas as pd
from sklearn.datasets import load_iris
dataset = load_iris()
X = pd.DataFrame(data=dataset["data"], columns=dataset["feature_names"])
y = pd.Series(dataset["target"])
# A random imputation of missing values step and one hot encoding of
# non-numeric features step are automatically added.
pipeline = Pipeline()
# Preprocess the inputs
pipeline.add_step(transformers["StandardScalerPrimitive"])
# Save the pipeline step index of the preprocessor's outputs
stack_input = pipeline.curr_step_i
# Add three classifiers to the pipeline that all take the
# preprocessor's outputs as inputs
stack_outputs = []
for clf_name in [
"LinearDiscriminantAnalysisPrimitive",
"DecisionTreeClassifierPrimitive",
"KNeighborsClassifierPrimitive"
]:
pipeline.add_step(classifiers[clf_name], [stack_input])
stack_outputs.append(pipeline.curr_step_i)
# Add a final classifier that takes the outputs of all the previous
# three classifiers as inputs
pipeline.add_step(classifiers["RandomForestClassifierPrimitive"], stack_outputs)
# Train the pipeline
pipeline.fit(X, y)
# Have fitted pipeline make predictions
pipeline.predict(X)
Package Opinions
- A pipeline's final step must be the step that produces the pipeline's final output.
- All missing values are imputed.
- All columns of type
object
andcategory
are one hot encoded.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file skplumber-0.6.0.dev0.tar.gz
.
File metadata
- Download URL: skplumber-0.6.0.dev0.tar.gz
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e104dcec5d21f04956a80432dc252891f40f660edb266b103e7e5b37b185ddca |
|
MD5 | 0ace9771a24525ff2748564b93b974b5 |
|
BLAKE2b-256 | 6e96b8a2a5a135cf3c45e3021c229a0aebefe8b65a2e3e09ef41a0febdc8a13d |
File details
Details for the file skplumber-0.6.0.dev0-py3-none-any.whl
.
File metadata
- Download URL: skplumber-0.6.0.dev0-py3-none-any.whl
- Upload date:
- Size: 29.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d73a23e42405c6102291ce2f01a176a48ae253de9f8d4689b8d7430ae3911e4a |
|
MD5 | e577d860f739279537aacd385cb235ba |
|
BLAKE2b-256 | 4223333efd99d2a8846c76933fe0b3534868265055fcd2cd701a79f58cfce0c5 |