pandas_confusion

Pandas matrix confusion with plot features (matplotlib, seaborn...)

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language

Project description

pandas_confusion

A Python Pandas implementation of confusion matrix.

WORK IN PROGRESS - Use it a your own risk

Usage

Confusion matrix

Import ConfusionMatrix

from pandas_confusion import ConfusionMatrix

Define actual values (y_actu) and predicted values (y_pred)

y_actu = ['rabbit', 'cat', 'rabbit', 'rabbit', 'cat', 'dog', 'dog', 'rabbit', 'rabbit', 'cat', 'dog', 'rabbit']
y_pred = ['cat', 'cat', 'rabbit', 'dog', 'cat', 'rabbit', 'dog', 'cat', 'rabbit', 'cat', 'rabbit', 'rabbit']

Let’s define a (non binary) confusion matrix

confusion_matrix = ConfusionMatrix(y_actu, y_pred)
print("Confusion matrix:\n%s" % confusion_matrix)

You can see it

Predicted  cat  dog  rabbit  __all__
Actual
cat          3    0       0        3
dog          0    1       2        3
rabbit       2    1       3        6
__all__      5    2       5       12

Matplotlib plot of a confusion matrix

confusion_matrix.plot()
plt.show()

Matplotlib plot of a normalized confusion matrix

confusion_matrix.plot(normalized=True)
plt.show()

Binary confusion matrix

Import BinaryConfusionMatrix and Backend

from pandas_confusion import BinaryConfusionMatrix, Backend

Define actual values (y_actu) and predicted values (y_pred)

y_actu = [ True,  True, False, False, False,  True, False,  True,  True,
           False,  True, False, False, False, False, False,  True, False,
            True,  True,  True,  True, False, False, False,  True, False,
            True, False, False, False, False,  True,  True, False, False,
           False,  True,  True,  True,  True, False, False, False, False,
            True, False, False, False, False, False, False, False, False,
           False,  True,  True, False,  True, False,  True,  True,  True,
           False, False,  True, False,  True, False, False,  True, False,
           False, False, False, False, False, False, False,  True, False,
            True,  True,  True,  True, False, False,  True, False,  True,
            True, False,  True, False,  True, False, False,  True,  True,
           False, False,  True,  True, False, False, False, False, False,
           False,  True,  True, False]

y_pred = [False, False, False, False, False,  True, False, False,  True,
       False,  True, False, False, False, False, False, False, False,
        True,  True,  True,  True, False, False, False, False, False,
       False, False, False, False, False,  True, False, False, False,
       False,  True, False, False, False, False, False, False, False,
        True, False, False, False, False, False, False, False, False,
       False,  True, False, False, False, False, False, False, False,
       False, False,  True, False, False, False, False,  True, False,
       False, False, False, False, False, False, False,  True, False,
       False,  True, False, False, False, False,  True, False,  True,
        True, False, False, False,  True, False, False,  True,  True,
       False, False,  True,  True, False, False, False, False, False,
       False,  True, False, False]

Let’s define a binary confusion matrix

binary_confusion_matrix = BinaryConfusionMatrix(y_actu, y_pred)
print("Binary confusion matrix:\n%s" % binary_confusion_matrix)

It display as a nicely labeled Pandas DataFrame

Binary confusion matrix:
Predicted  False  True  __all__
Actual
False         67     0       67
True          21    24       45
__all__       88    24      112

You can get useful attributes such as True Positive (TP), True Negative (TN) …

print binary_confusion_matrix.TP

Matplotlib plot of a binary confusion matrix

binary_confusion_matrix.plot()
plt.show()

Matplotlib plot of a normalized binary confusion matrix

binary_confusion_matrix.plot(normalized=True)
plt.show()

Seaborn plot of a binary confusion matrix (ToDo)

from pandas_confusion import Backend
binary_confusion_matrix.plot(backend=Backend.Seaborn)

Confusion matrix and class statistics

Overall statistics and class statistics of confusion matrix can be easily displayed.

y_true = [600, 200, 200, 200, 200, 200, 200, 200, 500, 500, 500, 200, 200, 200, 200, 200, 200, 200, 200, 200]
y_pred = [100, 200, 200, 100, 100, 200, 200, 200, 100, 200, 500, 100, 100, 100, 100, 100, 100, 100, 500, 200]
cm = ConfusionMatrix(y_true, y_pred)
cm.print_stats()

You should get:

Confusion Matrix:

Classes  100  200  500  600  __all__
Actual
100        0    0    0    0        0
200        9    6    1    0       16
500        1    1    1    0        3
600        1    0    0    0        1
__all__   11    7    2    0       20


Overall Statistics:

Accuracy: 0.35
95% CI: (0.1539092047845412, 0.59218853453282805)
No Information Rate: ToDo
P-Value [Acc > NIR]: 0.978585644357
Kappa: 0.0780141843972
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                 100         200         500   600
Population                               20          20          20    20
Condition positive                        0          16           3     1
Condition negative                       20           4          17    19
Test outcome positive                    11           7           2     0
Test outcome negative                     9          13          18    20
TP: True Positive                         0           6           1     0
TN: True Negative                         9           3          16    19
FP: False Positive                       11           1           1     0
FN: False Negative                        0          10           2     1
TPR: Sensivity                          NaN       0.375   0.3333333     0
TNR=SPC: Specificity                   0.45        0.75   0.9411765     1
PPV: Pos Pred Value = Precision           0   0.8571429         0.5   NaN
NPV: Neg Pred Value                       1   0.2307692   0.8888889  0.95
FPR: False-out                         0.55        0.25  0.05882353     0
FDR: False Discovery Rate                 1   0.1428571         0.5   NaN
FNR: Miss Rate                          NaN       0.625   0.6666667     1
ACC: Accuracy                          0.45        0.45        0.85  0.95
F1 score                                  0   0.5217391         0.4     0
MCC: Matthews correlation coefficient   NaN   0.1048285    0.326732   NaN
Informedness                            NaN       0.125   0.2745098     0
Markedness                                0  0.08791209   0.3888889   NaN
Prevalence                                0         0.8        0.15  0.05
LR+: Positive likelihood ratio          NaN         1.5    5.666667   NaN
LR-: Negative likelihood ratio          NaN   0.8333333   0.7083333     1
DOR: Diagnostic odds ratio              NaN         1.8           8   NaN
FOR: False omission rate                  0   0.7692308   0.1111111  0.05

Statistics are also available as an OrderedDict using:

cm.stats()

ToDo list

Better documentation
Doctest
Matplotlib discrete colorbar (not for normalized plot)

see ColorbarBase

http://stackoverflow.com/questions/14777066/matplotlib-discrete-colorbar

Display numbers inside cells like http://stackoverflow.com/questions/5821125/how-to-plot-confusion-matrix-with-string-axis-rather-than-integer-in-python
Compare with results from Sklearn

Example:

from sklearn.metrics import f1_score, classification_report
f1_score(y_actu, y_pred)
print classification_report(y_actu, y_pred)

Compare with R “caret” package

http://stackoverflow.com/questions/26631814/create-a-confusion-matrix-from-a-dataframe

Actual <- c(600, 200, 200, 200, 200, 200, 200, 200, 500, 500, 500, 200, 200, 200, 200, 200, 200, 200, 200, 200)
Predicted <- c(100, 200, 200, 100, 100, 200, 200, 200, 100, 200, 500, 100, 100, 100, 100, 100, 100, 100, 500, 200)
df <- data.frame(Actual, Predicted)
#table(df)
col <- sort(union(df$Actual, df$Predicted))
df_conf <- table(lapply(df, factor, levels=col))
#table(lapply(df, factor, levels=seq(100, 600, 100)))
#table(lapply(df, factor, levels=c(100, 200, 500, 600)))

Python

>>> from pandas_confusion import ConfusionMatrix
>>> y_true = [600, 200, 200, 200, 200, 200, 200, 200, 500, 500, 500, 200, 200, 200, 200, 200, 200, 200, 200, 200]
>>> y_pred = [100, 200, 200, 100, 100, 200, 200, 200, 100, 200, 500, 100, 100, 100, 100, 100, 100, 100, 500, 200]
>>> cm = ConfusionMatrix(y_true, y_pred)
>>> cm
Predicted  100  200  500  600  __all__
Actual
100          0    0    0    0        0
200          9    6    1    0       16
500          1    1    1    0        3
600          1    0    0    0        1
__all__     11    7    2    0       20

cm(i, j) in Python is conf_mat(j, i) in R

You can use cm.to_dataframe().transpose()

Overall statistics: No Information Rate, Mcnemar’s Test P-Value

see confusionMatrix.R and print.confusionMatrix.R (caret) and e1071 package
Class statistics
- see Caret code for Detection Rate, Detection Prevalence, Balanced Accuracy
Code metrics (landscape.io)
Create fake truth, prediction from confusion matrix (can be useful for unit test)

https://www.researchgate.net/post/Can_someone_help_me_to_calculate_accuracy_sensitivity_of_a_66_confusion_matrix

see code (ToDo)

Order confusion matrix easily
Create empty class easily

cm = ConfusionMatrix(y_true, y_pred, labels=range(100, 600+1, 100))

Class 300 and class 400 should be create

R like method ? conf_mat_tab <- table(lapply(df, factor, levels = seq(100, 600, 100)))

http://pandas.pydata.org/pandas-docs/stable/comparison_with_r.html

idx_new_cls = pd.Index([300, 400])
new_idx = df.index | idx_new_cls
new_idx.name = 'Actual'
new_col = df.index | idx_new_cls
new_col.name = 'Predicted'
df = df.loc[new_idx, new_col].fillna(0)

see cm.enlarge(...)

Calculate Mcnemar’s Test P-Value with binary confusion matrix

R code

Actual <- c(TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE,
        FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
        TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE,
        TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE,
        FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE,
        TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
        FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE,
        FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE,
        FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
        TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE,
        TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE,
        FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE,
        FALSE, TRUE, TRUE, FALSE)

Predicted <- c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE,
      FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
      TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE,
      FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE,
      FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
      TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
      FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
      FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
      FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
      FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE,
      TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE,
      FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE,
      FALSE, TRUE, FALSE, FALSE)

Install

$ conda install pandas scikit-learn scipy

$ pip install pandas_confusion

Done

Continuous integration (Travis)
Convert a confusion matrix to a binary confusion matrix
Python package
Unit tests (nose)
Fix missing column and missing row
Overall statistics: Accuracy, 95% CI, P-Value [Acc > NIR], Kappa

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

0.0.6

Nov 17, 2015

0.0.5

Nov 17, 2015

0.0.4

Jul 31, 2015

0.0.3

May 29, 2015

This version

0.0.2

May 23, 2015

0.0.1

May 23, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_confusion-0.0.2.tar.gz (12.1 kB view hashes)

Uploaded May 23, 2015 Source

Built Distribution

pandas_confusion-0.0.2-py2-none-any.whl (16.5 kB view hashes)

Uploaded May 23, 2015 Python 2

Hashes for pandas_confusion-0.0.2.tar.gz

Hashes for pandas_confusion-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`99736f44df1f93e91f86bb10d8cecbbaa0cc02c72a5adcfe734a1ba72be2292c`
MD5	`a55d2015056ad77770f346cb058c0efb`
BLAKE2b-256	`8a836216a22215198dda10b59894252913040387d416f01ef011ff30cac36afc`

Hashes for pandas_confusion-0.0.2-py2-none-any.whl

Hashes for pandas_confusion-0.0.2-py2-none-any.whl
Algorithm	Hash digest
SHA256	`37278856dfdf2067c13b3b3666d29f9b56dddaf03a17636fec6ef954a090f2f4`
MD5	`a3098434a3dd6bac72323f5afb1ded87`
BLAKE2b-256	`111a1c9c79f20e2e2a735c563ec69a7bb521c26df6dd13927023b8437acdc10f`