stacked_generalization

Machine Learning Stacking Uti

Project description

stacked_generalization

Implemented machine learning *stacking technic[1]* as handy library in Python. Feature weighted linear stacking is also available. (See https://github.com/fukatani/stacked_generalization/tree/master/stacked_generalization/example.)

feature

1) Any scikit-learn model is availavle for Stage 0 and Stage 1 model. And stacked model itself has the same interface as scikit-learn library.

ex.

from stacked_generalization.lib.stacking import StackedClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn import datasets, metrics
iris = datasets.load_iris()

# Stage 1 model
bclf = LogisticRegression(random_state=1)

# Stage 0 models
clfs = [RandomForestClassifier(n_estimators=40, criterion = 'gini', random_state=1),
        GradientBoostingClassifier(n_estimators=25, random_state=1),
        RidgeClassifier(random_state=1)]

# same interface as scikit-learn
sl = StackedClassifier(bclf, clfs)
sl.fit(iris.target, iris.data)
score = metrics.accuracy_score(iris.target, classifier.predict(iris.data))
print("Accuracy: %f" % score)

More detail example is here. https://github.com/fukatani/stacked_generalization/blob/master/stacked_generalization/example/cross_validation_for_iris.py

https://github.com/fukatani/stacked_generalization/blob/master/stacked_generalization/example/simple_regression.py

Stacked learning model itself is used as sk-learn model, so you can replace model such as RandomForestClassifier to stacked model easily in your scripts.

2) Evaluation model by out-of-bugs score.

Stacking technic itself uses CV to stage0. So if you use CV for entire stacked model, *each stage 0 model are fitted n_folds squared times.* Sometimes its computational cost can be significent,therefore we implemented CV only for stage1[2].

For example, when we get 3 blends (stage0 prediction), 2 blends are used for stage 1 fitting. The remaining one blend is used for model test. Repitation this cycle for all 3 blends, and averaging scores, we can get oob (out-of-bugs) score *with only n_fold times stage0 fitting.*

ex.

sl = StackedClassifier(bclf, clfs, oob_score_flag=True)
sl.fit(iris.target, iris.data)
print("Accuracy: %f" % sl.oob_score_)

3) Caching stage1 blend_data and trained model. (optional)

sl = StackedClassifier(bclf, clfs, save_stage0=True, save_dir='stack_temp')

Software Requirement

Python (2.7 or later)
scikit-learn

Installation

git clone https://github.com/fukatani/stacked_generalization.git
python setup.py install

License

MIT License. (http://opensource.org/licenses/mit-license.php)

Copyright

Many part of the implementation is based on the following. Thanks! https://github.com/log0/vertebral/blob/master/stacked_generalization.py

Other

Any contributions (implement, documentation, test or idea…) are welcome.

References

[1] L. Breiman, “Stacked Regressions”, Machine Learning, 24, 49-64 (1996). [2] J. Sill1 et al, “Feature Weighted Linear Stacking”, https://arxiv.org/abs/0911.0460, 2009.

Project details

Release history Release notifications | RSS feed

0.0.6

May 25, 2018

0.0.5

May 25, 2018

0.0.4

Aug 13, 2016

This version

0.0.3

Jul 22, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stacked_generalization-0.0.3.zip (22.4 kB view hashes)

Uploaded Jul 22, 2016 Source

Hashes for stacked_generalization-0.0.3.zip

Hashes for stacked_generalization-0.0.3.zip
Algorithm	Hash digest
SHA256	`3f7a7f5f031ec271105a6b87c7bbd5357d8c90fbf369d25dedb4d4c1ea61722f`
MD5	`1160d97c3efa2ff05c2155154ef2e6d3`
BLAKE2b-256	`a621748a6470b0b8f7c2b8ac1df7a28a84d71ba60e1ed3a247b0a02b26cf5f8a`