Machine Learning Stacking Uti
Project description
stacked_generalization
Implemented machine learning *stacking technic[1]* as handy library in Python. Feature weighted linear stacking is also available. (See https://github.com/fukatani/stacked_generalization/tree/master/stacked_generalization/example.)
feature
1) Any scikit-learn model is availavle for Stage 0 and Stage 1 model. And stacked model itself has the same interface as scikit-learn library.
ex.
from stacked_generalization.lib.stacking import StackedClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn import datasets, metrics
iris = datasets.load_iris()
# Stage 1 model
bclf = LogisticRegression(random_state=1)
# Stage 0 models
clfs = [RandomForestClassifier(n_estimators=40, criterion = 'gini', random_state=1),
GradientBoostingClassifier(n_estimators=25, random_state=1),
RidgeClassifier(random_state=1)]
# same interface as scikit-learn
sl = StackedClassifier(bclf, clfs)
sl.fit(iris.target, iris.data)
score = metrics.accuracy_score(iris.target, classifier.predict(iris.data))
print("Accuracy: %f" % score)
More detail example is here. https://github.com/fukatani/stacked_generalization/blob/master/stacked_generalization/example/cross_validation_for_iris.py
Stacked learning model itself is used as sk-learn model, so you can replace model such as RandomForestClassifier to stacked model easily in your scripts.
2) Evaluation model by out-of-bugs score.
Stacking technic itself uses CV to stage0. So if you use CV for entire stacked model, *each stage 0 model are fitted n_folds squared times.* Sometimes its computational cost can be significent,therefore we implemented CV only for stage1[2].
For example, when we get 3 blends (stage0 prediction), 2 blends are used for stage 1 fitting. The remaining one blend is used for model test. Repitation this cycle for all 3 blends, and averaging scores, we can get oob (out-of-bugs) score *with only n_fold times stage0 fitting.*
ex.
sl = StackedClassifier(bclf, clfs, oob_score_flag=True)
sl.fit(iris.target, iris.data)
print("Accuracy: %f" % sl.oob_score_)
3) Caching stage1 blend_data and trained model. (optional)
sl = StackedClassifier(bclf, clfs, save_stage0=True, save_dir='stack_temp')
Software Requirement
Python (2.7 or later)
scikit-learn
Installation
git clone https://github.com/fukatani/stacked_generalization.git python setup.py install
License
MIT License. (http://opensource.org/licenses/mit-license.php)
Copyright
Copyright (C) 2016, Ryosuke Fukatani
Many part of the implementation is based on the following. Thanks! https://github.com/log0/vertebral/blob/master/stacked_generalization.py
Other
Any contributions (implement, documentation, test or idea…) are welcome.
References
[1] L. Breiman, “Stacked Regressions”, Machine Learning, 24, 49-64 (1996). [2] J. Sill1 et al, “Feature Weighted Linear Stacking”, https://arxiv.org/abs/0911.0460, 2009.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for stacked_generalization-0.0.3.zip
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f7a7f5f031ec271105a6b87c7bbd5357d8c90fbf369d25dedb4d4c1ea61722f |
|
MD5 | 1160d97c3efa2ff05c2155154ef2e6d3 |
|
BLAKE2b-256 | a621748a6470b0b8f7c2b8ac1df7a28a84d71ba60e1ed3a247b0a02b26cf5f8a |