skplumber

A scikit-learn based AutoML tool

Project description

       ______         ______                 ______
__________  /____________  /___  ________ ______  /______________
__  ___/_  //_/__  __ \_  /_  / / /_  __ `__ \_  __ \  _ \_  ___/
_(__  )_  ,<  __  /_/ /  / / /_/ /_  / / / / /  /_/ /  __/  /
/____/ /_/|_| _  .___//_/  \__,_/ /_/ /_/ /_//_.___/\___//_/
              /_/

skplumber is a Machine Learning (ML) package with two core things to offer:

An Automated Machine Learning (AutoML) system for automatically sampling, training, scoring, and tuning machine learning pipelines on classification or regression problems. This is available as the skplumber.skplumber.SKPlumber class.
A lightweight ML framework for composing ML primitives into pipelines (skplumber.pipeline.Pipeline) of arbitrary shape, and for training and fitting those pipelines using various evaluation techniques (e.g. train/test split, k-fold cross validation, and down-sampling). Also, all primitive hyperparameters come pre-annotated with types and range information so hyperparameters can be more easily interacted with. Additionally, an existing hyperparameter tuning technique is provided by skplumber.tuners.ga.ga_tune.

The base pipeline and primitive constructs take heavily from the same constructs as they exist in the Data Driven Discovery of Models (D3M) core package.

API documentation for the project is located here.

Installation

pip install skplumber

Usage

The `SKPlumber` AutoML System

The top-level API of the package is the skplumber.skplumber.SKPlumber class. You instantiate the class, then use it's fit method to perform a search for an optimal machine learning (ML) pipeline, given your input data X, and y (a pandas.DataFrame and pandas.Series respectively). Here is an example using the classic iris dataset:

from skplumber import SKPlumber
import pandas as pd
from sklearn.datasets import load_iris

dataset = load_iris()
X = pd.DataFrame(data=dataset["data"], columns=dataset["feature_names"])
y = pd.Series(dataset["target"])

# Ask plumber to find the best machine learning pipeline it
# can for the problem in 60 seconds.
plumber = SKPlumber(problem="classification", budget=60)
plumber.fit(X, y)

# To use the best found machine learning pipeline on unseen data:
predictions = plumber.predict(unseen_X)

`Pipeline`

The skplumber.pipeline.Pipeline class is a slightly lower level API for the package that can be used to build, fit, and predict arbitrarily shaped machine learning pipelines. For example, we can create a basic single level stacking pipeline, where the output from predictors are fed into another predictor to ensemble in a learned way:

from skplumber import Pipeline
from skplumber.primitives import transformers, classifiers
import pandas as pd
from sklearn.datasets import load_iris

dataset = load_iris()
X = pd.DataFrame(data=dataset["data"], columns=dataset["feature_names"])
y = pd.Series(dataset["target"])

# A random imputation of missing values step and one hot encoding of
# non-numeric features step are automatically added.
pipeline = Pipeline()
# Preprocess the inputs
pipeline.add_step(transformers["StandardScalerPrimitive"])
# Save the pipeline step index of the preprocessor's outputs
stack_input = pipeline.curr_step_i
# Add three classifiers to the pipeline that all take the
# preprocessor's outputs as inputs
stack_outputs = []
for clf_name in [
    "LinearDiscriminantAnalysisPrimitive",
    "DecisionTreeClassifierPrimitive",
    "KNeighborsClassifierPrimitive"
]:
    pipeline.add_step(classifiers[clf_name], [stack_input])
    stack_outputs.append(pipeline.curr_step_i)
# Add a final classifier that takes the outputs of all the previous
# three classifiers as inputs
pipeline.add_step(classifiers["RandomForestClassifierPrimitive"], stack_outputs)

# Train the pipeline
pipeline.fit(X, y)

# Have fitted pipeline make predictions
pipeline.predict(X)

Package Opinions

A pipeline's final step must be the step that produces the pipeline's final output.
All missing values are imputed.
All columns of type object and category are one hot encoded.

Project details

Release history Release notifications | RSS feed

This version

0.6.5.dev0 pre-release

Apr 27, 2020

0.6.4.dev0 pre-release

Apr 22, 2020

0.6.3.dev0 pre-release

Apr 14, 2020

0.6.2.dev0 pre-release

Apr 14, 2020

0.6.1.dev0 pre-release

Apr 14, 2020

0.6.0.dev0 pre-release

Apr 12, 2020

0.5.0.dev0 pre-release

Apr 7, 2020

0.4.7.dev0 pre-release

Apr 1, 2020

0.4.6.dev0 pre-release

Jan 3, 2020

0.4.5.dev0 pre-release

Jan 1, 2020

0.4.4.dev0 pre-release

Jan 1, 2020

0.4.3.dev0 pre-release

Jan 1, 2020

0.4.2.dev0 pre-release

Dec 25, 2019

0.4.0.dev0 pre-release

Dec 12, 2019

0.3.6.dev0 pre-release

Dec 10, 2019

0.3.5.dev0 pre-release

Dec 9, 2019

0.3.4.dev0 pre-release

Dec 5, 2019

0.3.3.dev0 pre-release

Dec 4, 2019

0.3.2.dev0 pre-release

Dec 3, 2019

0.3.1.dev0 pre-release

Dec 3, 2019

0.3.0.dev0 pre-release

Dec 2, 2019

0.2.1.dev0 pre-release

Nov 26, 2019

0.2.dev0 pre-release

Nov 26, 2019

0.1.dev0 pre-release

Nov 13, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skplumber-0.6.5.dev0.tar.gz (24.2 kB view details)

Uploaded Apr 27, 2020 Source

Built Distribution

skplumber-0.6.5.dev0-py3-none-any.whl (31.2 kB view details)

Uploaded Apr 27, 2020 Python 3

File details

Details for the file skplumber-0.6.5.dev0.tar.gz.

File metadata

Download URL: skplumber-0.6.5.dev0.tar.gz
Upload date: Apr 27, 2020
Size: 24.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.9

File hashes

Hashes for skplumber-0.6.5.dev0.tar.gz
Algorithm	Hash digest
SHA256	`bc2678ad26eebb93fda647e0bde5d2a5e17cdca8f0a1b2bed674628c7b269f07`
MD5	`21c953ceb89466cfa04699bec36326a7`
BLAKE2b-256	`fe7bbadce40f232bde8fa66897c27f4eae879773ca8d8d5d6c0242bbbb293afb`

See more details on using hashes here.

File details

Details for the file skplumber-0.6.5.dev0-py3-none-any.whl.

File metadata

Download URL: skplumber-0.6.5.dev0-py3-none-any.whl
Upload date: Apr 27, 2020
Size: 31.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.9

File hashes

Hashes for skplumber-0.6.5.dev0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b6197739cfa43f25fb8e9901e1159277dfa30a8a80e97655bcbdcf9aced02e8`
MD5	`6332563135581194d0c5b9fbc3a4c483`
BLAKE2b-256	`9d089a1881b101cf3b0c3b57d9a3f46b081db0a63166b222103b9a7d14ab212f`

See more details on using hashes here.

skplumber 0.6.5.dev0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Installation

Usage

The `SKPlumber` AutoML System

`Pipeline`

Package Opinions

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

skplumber 0.6.5.dev0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Installation

Usage

The SKPlumber AutoML System

Pipeline

Package Opinions

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

The `SKPlumber` AutoML System

`Pipeline`