Skip to main content

statistics tools and utilities

Project description

scikit-stats: statistics tools and utilities

Build Status Azure DevOps tests Azure DevOps coverage Binder

Getting Started

The scikit-stats module includes modeling and hypothesis tests submodules. This a quick user guide to each submodule. The binder examples are also a good way to get started.

modeling

The modeling submodule includes the Bayesian Block algorithm that can be used to improve the binning of histograms. The visual improvement can be dramatic, and more importantly, this algorithm produces histograms that accurately represent the underlying distribution while being robust to statistical fluctuations. Here is a small example of the algorithm applied on Laplacian sampled data, compared to a histogram of this sample with a fine binning.

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from skstats.modeling import bayesian_blocks

>>> data = np.random.laplace(size=10000)
>>> blocks = bayesian_blocks(data)

>>> plt.hist(data, bins=1000, label='Fine Binning', density=True, alpha=0.6)
>>> plt.hist(data, bins=blocks, label='Bayesian Blocks', histtype='step', density=True, linewidth=2)
>>> plt.legend(loc=2)

bayesian blocks example

hypotests

This submodule provides tools to do hypothesis tests such as discovery test and computations of upper limits or confidence intervals. scikit-stats needs a fitting backend to perform computations such as zfit. Any fitting library can be used if their API is compatible with scikit-stats (see api checks).

We give here a simple example of a discovery test, using zfit as backend, of gaussian signal with known mean and sigma over an exponential background.

>>> import zfit
>>> from zfit.core.loss import ExtendedUnbinnedNLL
>>> from zfit.minimize import Minuit

>>> bounds = (0.1, 3.0)
>>> zfit.Space('x', limits=bounds)

>>> bkg = np.random.exponential(0.5, 300)
>>> peak = np.random.normal(1.2, 0.1, 25)
>>> data = np.concatenate((bkg, peak))
>>> data = data[(data > bounds[0]) & (data < bounds[1])]
>>> N = data.size
>>> data = zfit.data.Data.from_numpy(obs=obs, array=data)

>>> lambda_ = zfit.Parameter("lambda", -2.0, -4.0, -1.0)
>>> Nsig = zfit.Parameter("Ns", 20., -20., N)
>>> Nbkg = zfit.Parameter("Nbkg", N, 0., N*1.1)
>>> signal = Nsig * zfit.pdf.Gauss(obs=obs, mu=1.2, sigma=0.1)
>>> background = Nbkg * zfit.pdf.Exponential(obs=obs, lambda_=lambda_)
>>> loss = ExtendedUnbinnedNLL(model=[signal + background], data=[data], fit_range=[obs])

>>> from skstats.hypotests.calculators import AsymptoticCalculator
>>> from skstats.hypotests import Discovery
>>> from skstats.hypotests.parameters import POI

>>> calculator = AsymptoticCalculator(loss, Minuit())
>>> poinull = POI(Nsig, 0)
>>> discovery_test = Discovery(calculator, [poinull])
>>> discovery_test.result()

p_value for the Null hypothesis = 0.0007571045424956679
Significance (in units of sigma) = 3.1719464825102244

The discovery test prints out the pvalue and the significance of the null hypothesis to be rejected.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit-stats-0.1.0.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

scikit_stats-0.1.0-py2.py3-none-any.whl (28.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file scikit-stats-0.1.0.tar.gz.

File metadata

  • Download URL: scikit-stats-0.1.0.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.24.0 CPython/3.7.1

File hashes

Hashes for scikit-stats-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a82e8f6eadfe6f592a0777f90b8b9988745dc2749a0d7f08d66653e657091d37
MD5 40912d88392476c30c36220ef5af012d
BLAKE2b-256 bd370c6cdd5d130edcde22985563a599f7e45589ecc32238a9db017a14c829ce

See more details on using hashes here.

File details

Details for the file scikit_stats-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: scikit_stats-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.24.0 CPython/3.7.1

File hashes

Hashes for scikit_stats-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e88f4d7454fc089fd7804541c24c05fcfb3b16b8b1189f820d2ac85f209e09f2
MD5 ecb27829fa1d148a60461d807a8843ed
BLAKE2b-256 1575217d177bfcca61f0e0736c6f1345de5f93102cf29032b8dca64342abad22

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page