Machine Learning model performance metrics & charts with confidence intervals, optimized with numba to be fast

These details have not been verified by PyPI

Project links

Project description

fronni

A Python library for quickly calculating & displaying machine learning model performance metrics with confidence intervals.

Data scientists spend a lot of time evaluating the performance of their machine learning models. A common means of doing so is a classification report, such as the one below built-into the scikit-learn library (referenced over 440,000 times on Github).

The problem with depending on this is that when you have imbalanced and/or small datasets in your test sample, the volume of data that makes up each point estimate can be very small. As a result, whatever estimates you have for the precision and recall values will likely be unrepresentative of what happens in the real world. Instead, it would be far better to present to the reader a range of values from confidence intervals.

We can easily create confidence intervals for any metric of interest by using the bootstrap technique to create hundreds of samples from the original dataset (with replacement), and then throw away the most extreme 5% of values to get a 95% confidence interval, for example.

Examples

With a pandas dataframe named 'df', you can generate your metrics like this:

from fronni.classification import classification_report
report = classification_report(df['label'], df['predicted'], n=1000)
print(report)

Requirements

Python >= 3.6
numba
numpy
scikit-learn
plotly

Installing fronni

pip install fronni

How fronni works

When the datasets are large, running the bootstrap as described above becomes a computationally expensive calculation. Even a relatively small sample of a million rows, when resampled 1,000 times, turns into a billion-row dataset and causes the calculations to take close to a minute. This isn’t a problem when doing these calculations offline, but engineers are often working interactively in a Jupyter notebook and want answers quickly. For this reason we decided to create a few fast implementations of these calculations for common machine learning metrics, such as precision & recall using the numba library, which provides a speedup of approximately 23X over regular Python parallel-processing code.

Full documentation

Functions from the classification module:

classification_report

Generates confidence intervals for precision, recall, & F1 metrics for a binary or multi-class classification model, given arrays of predicted & label values.

Parameter	Type	Default
label	Numpy array or Pandas series	None
predicted	Numpy array or Pandas series	None
n	integer, number of bootstrap iterations	1,000
confidence_level	integer value between 1 & 100	95
as_dict	Boolean, return nested dictionary if True otherwise Pandas dataframe	False
confidence_level	value between 1 & 100	95
sort_by_sample_size	Boolean, return the Pandas dataframe, sorted in descending order of class sample size	False

plot_classification_report

Plots precision, recall, & confidence intervals for F1 metrics for a binary or multi-class classification model, given a classification report input.

Parameter	Type	Default
report	output from classification_report	None
save_to_filename	string, path of filename image to save like "image.png"	None

From the regression module:

regression_report

Generates confidence intervals for RMSE, MAE, and R^2 metrics for a regression model, given arrays of predicted & label values.

Parameter	Type	Default
label	Numpy array or Pandas series	None
predicted	Numpy array or Pandas series	None
n	integer, number of bootstrap iterations	1,000
as_dict	Boolean, return nested dictionary if True otherwise Pandas dataframe	False

See the CONTRIBUTING file for how to help out.

License

fronni is Apache 2.0 licensed, as found in the LICENSE file.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.6

Dec 16, 2021

This version

0.0.5

Oct 9, 2021

0.0.4

Oct 9, 2021

0.0.3

Oct 9, 2021

0.0.1

Oct 8, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fronni-0.0.5.tar.gz (11.6 kB view hashes)

Uploaded Oct 9, 2021 Source

Hashes for fronni-0.0.5.tar.gz

Hashes for fronni-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`27019dc0d983fbb35de523bd0f8ac5e81c16497001e02fb01eca6b2856299616`
MD5	`dbbc2ec19f622e59fe75e679c220df03`
BLAKE2b-256	`618a3650d90315800a872d6b7ed43dd98e26d0eece4e36c2560bc12113919099`