Skip to main content

Debug machine learning classifiers and explain their predictions

Project description

PyPI Version Build Status Code Coverage Documentation

ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions.

explain_prediction for text data

It provides support for the following machine learning frameworks and packages:

  • scikit-learn. Currently ELI5 allows to explain weights and predictions of scikit-learn linear classifiers and regressors, print decision trees as text or as SVG, show feature importances and explain predictions of decision trees and tree-based ensembles. ELI5 understands text processing utilities from scikit-learn and can highlight text data accordingly. It also allows to debug scikit-learn pipelines which contain HashingVectorizer, by undoing hashing.

  • xgboost - show feature importances and explain predictions of XGBClassifier and XGBRegressor.

  • lightning - explain weights and predictions of lightning classifiers and regressors.

  • sklearn-crfsuite. ELI5 allows to check weights of sklearn_crfsuite.CRF models.

ELI5 also provides TextExplainer which allows to explain predictions of any text classifier using LIME algorithm (Ribeiro et al., 2016). There are utilities for using LIME with non-text data and arbitrary black-box classifiers as well, but this feature is currently experimental.

Explanation and formatting are separated; you can get text-based explanation to display in console, HTML version embeddable in an IPython notebook or web dashboards, or JSON version which allows to implement custom rendering and formatting on a client.

License is MIT.

Check docs for more.

Changelog

0.3 (2017-01-13)

  • eli5.explain_prediction works for XGBClassifier, XGBRegressor from XGBoost and for ExtraTreesClassifier, ExtraTreesRegressor, GradientBoostingClassifier, GradientBoostingRegressor, RandomForestClassifier, RandomForestRegressor, DecisionTreeClassifier and DecisionTreeRegressor from scikit-learn. Explanation method is based on http://blog.datadive.net/interpreting-random-forests/ .

  • eli5.explain_weights now supports tree-based regressors from scikit-learn: DecisionTreeRegressor, AdaBoostRegressor, GradientBoostingRegressor, RandomForestRegressor and ExtraTreesRegressor.

  • eli5.explain_weights works for XGBRegressor;

  • new TextExplainer <lime-tutorial> class allows to explain predictions of black-box text classification pipelines using LIME algorithm; many improvements in eli5.lime <eli5-lime>.

  • better sklearn.pipeline.FeatureUnion support in eli5.explain_prediction;

  • rendering performance is improved;

  • a number of remaining feature importances is shown when the feature importance table is truncated;

  • styling of feature importances tables is fixed;

  • eli5.explain_weights and eli5.explain_prediction support more linear estimators from scikit-learn: HuberRegressor, LarsCV, LassoCV, LassoLars, LassoLarsCV, LassoLarsIC, OrthogonalMatchingPursuit, OrthogonalMatchingPursuitCV, PassiveAggressiveRegressor, RidgeClassifier, RidgeClassifierCV, TheilSenRegressor.

  • text-based formatting of decision trees is changed: for binary classification trees only a probability of “true” class is printed, not both probabilities as it was before.

  • eli5.explain_weights supports feature_filter in addition to feature_re for filtering features, and eli5.explain_prediction now also supports both of these arguments;

  • ‘Weight’ column is renamed to ‘Contribution’ in the output of eli5.explain_prediction;

  • new show_feature_values=True formatter argument allows to display input feature values;

  • fixed an issue with analyzer=’char_wb’ highlighting at the start of the text.

0.2 (2016-12-03)

  • XGBClassifier support (from XGBoost package);

  • eli5.explain_weights support for sklearn OneVsRestClassifier;

  • std deviation of feature importances is no longer printed as zero if it is not available.

0.1.1 (2016-11-25)

  • packaging fixes: require attrs > 16.0.0, fixed README rendering

0.1 (2016-11-24)

  • HTML output;

  • IPython integration;

  • JSON output;

  • visualization of scikit-learn text vectorizers;

  • sklearn-crfsuite support;

  • lightning support;

  • eli5.show_weights and eli5.show_prediction functions;

  • eli5.explain_weights and eli5.explain_prediction functions;

  • eli5.lime <eli5-lime> improvements: samplers for non-text data, bug fixes, docs;

  • HashingVectorizer is supported for regression tasks;

  • performance improvements - feature names are lazy;

  • sklearn ElasticNetCV and RidgeCV support;

  • it is now possible to customize formatting output - show/hide sections, change layout;

  • sklearn OneVsRestClassifier support;

  • sklearn DecisionTreeClassifier visualization (text-based or svg-based);

  • dropped support for scikit-learn < 0.18;

  • basic mypy type annotations;

  • feature_re argument allows to show only a subset of features;

  • target_names argument allows to change display names of targets/classes;

  • targets argument allows to show a subset of targets/classes and change their display order;

  • documentation, more examples.

0.0.6 (2016-10-12)

  • Candidate features in eli5.sklearn.InvertableHashingVectorizer are ordered by their frequency, first candidate is always positive.

0.0.5 (2016-09-27)

  • HashingVectorizer support in explain_prediction;

  • add an option to pass coefficient scaling array; it is useful if you want to compare coefficients for features which scale or sign is different in the input;

  • bug fix: classifier weights are no longer changed by eli5 functions.

0.0.4 (2016-09-24)

  • eli5.sklearn.InvertableHashingVectorizer and eli5.sklearn.FeatureUnhasher allow to recover feature names for pipelines which use HashingVectorizer or FeatureHasher;

  • added support for scikit-learn linear regression models (ElasticNet, Lars, Lasso, LinearRegression, LinearSVR, Ridge, SGDRegressor);

  • doc and vec arguments are swapped in explain_prediction function; vec can now be omitted if an example is already vectorized;

  • fixed issue with dense feature vectors;

  • all class_names arguments are renamed to target_names;

  • feature name guessing is fixed for scikit-learn ensemble estimators;

  • testing improvements.

0.0.3 (2016-09-21)

  • support any black-box classifier using LIME (http://arxiv.org/abs/1602.04938) algorithm; text data support is built-in;

  • “vectorized” argument for sklearn.explain_prediction; it allows to pass example which is already vectorized;

  • allow to pass feature_names explicitly;

  • support classifiers without get_feature_names method using auto-generated feature names.

0.0.2 (2016-09-19)

  • ‘top’ argument of explain_prediction can be a tuple (num_positive, num_negative);

  • classifier name is no longer printed by default;

  • added eli5.sklearn.explain_prediction to explain individual examples;

  • fixed numpy warning.

0.0.1 (2016-09-15)

Pre-release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eli5-0.3.tar.gz (164.6 kB view details)

Uploaded Source

Built Distribution

eli5-0.3-py2.py3-none-any.whl (75.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file eli5-0.3.tar.gz.

File metadata

  • Download URL: eli5-0.3.tar.gz
  • Upload date:
  • Size: 164.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for eli5-0.3.tar.gz
Algorithm Hash digest
SHA256 8a66b6630f684557855407d74f4992636f8beb1fdcaf5040f6bc3c3502750538
MD5 9b15fcac75c8ddf7fef58148e00ebaf1
BLAKE2b-256 bdf148610adc671d7e92bf043e57a9cbf950125066a6cd094b028e2d41082b86

See more details on using hashes here.

Provenance

File details

Details for the file eli5-0.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for eli5-0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2a514d7403b7204ed0109a386a57f4179423d1f74620517b830361b531d7aead
MD5 f9c510d011941442120a3ab24376325d
BLAKE2b-256 42ac3e5469dcaf79fb903f4a49ec5bf149563cd693c3bb9d4a463dbb616a3f3c

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page