Skip to main content

statistics tools and utilities

Project description

Scikit-HEP project hepstats package: statistics tools and utilities

PyPI - Python Version DOI Build Status Azure DevOps coverage Azure DevOps tests Binder

Installation

Install hepstats like any other Python package:

pip install hepstats

or similar (use e.g. virtualenv if you wish).

Getting Started

The hepstats module includes modeling and hypothesis tests submodules. This a quick user guide to each submodule. The binder examples are also a good way to get started.

modeling

The modeling submodule includes the Bayesian Block algorithm that can be used to improve the binning of histograms. The visual improvement can be dramatic, and more importantly, this algorithm produces histograms that accurately represent the underlying distribution while being robust to statistical fluctuations. Here is a small example of the algorithm applied on Laplacian sampled data, compared to a histogram of this sample with a fine binning.

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from hepstats.modeling import bayesian_blocks

>>> data = np.random.laplace(size=10000)
>>> blocks = bayesian_blocks(data)

>>> plt.hist(data, bins=1000, label='Fine Binning', density=True, alpha=0.6)
>>> plt.hist(data, bins=blocks, label='Bayesian Blocks', histtype='step', density=True, linewidth=2)
>>> plt.legend(loc=2)

bayesian blocks example

hypotests

This submodule provides tools to do hypothesis tests such as discovery test and computations of upper limits or confidence intervals. hepstats needs a fitting backend to perform computations such as zfit. Any fitting library can be used if their API is compatible with hepstats (see api checks).

We give here a simple example of a discovery test, using the zfit fitting package as backend, of a Gaussian signal with known mean and sigma over an exponential background.

>>> import zfit
>>> from zfit.loss import ExtendedUnbinnedNLL
>>> from zfit.minimize import Minuit

>>> bounds = (0.1, 3.0)
>>> obs = zfit.Space('x', limits=bounds)

>>> bkg = np.random.exponential(0.5, 300)
>>> peak = np.random.normal(1.2, 0.1, 25)
>>> data = np.concatenate((bkg, peak))
>>> data = data[(data > bounds[0]) & (data < bounds[1])]
>>> N = data.size
>>> data = zfit.Data.from_numpy(obs=obs, array=data)

>>> lambda_ = zfit.Parameter("lambda", -2.0, -4.0, -1.0)
>>> Nsig = zfit.Parameter("Ns", 20., -20., N)
>>> Nbkg = zfit.Parameter("Nbkg", N, 0., N*1.1)
>>> signal = Nsig * zfit.pdf.Gauss(obs=obs, mu=1.2, sigma=0.1)
>>> background = Nbkg * zfit.pdf.Exponential(obs=obs, lambda_=lambda_)
>>> loss = ExtendedUnbinnedNLL(model=signal + background, data=data)

>>> from hepstats.hypotests.calculators import AsymptoticCalculator
>>> from hepstats.hypotests import Discovery
>>> from hepstats.hypotests.parameters import POI

>>> calculator = AsymptoticCalculator(loss, Minuit())
>>> poinull = POI(Nsig, 0)
>>> discovery_test = Discovery(calculator, [poinull])
>>> discovery_test.result()

p_value for the Null hypothesis = 0.0007571045424956679
Significance (in units of sigma) = 3.1719464825102244

The discovery test prints out the p-value and the significance of the null hypothesis to be rejected.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hepstats-0.2.0.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

hepstats-0.2.0-py2.py3-none-any.whl (24.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file hepstats-0.2.0.tar.gz.

File metadata

  • Download URL: hepstats-0.2.0.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.6.8

File hashes

Hashes for hepstats-0.2.0.tar.gz
Algorithm Hash digest
SHA256 242c94e971a8c1638ad2f8dca786ebb237476abd05485a7c90061fc94591c5a6
MD5 73c6880135258cfe819b373f3903e33a
BLAKE2b-256 9c94ca9e822379636f8390c20cc7a74f30265b7beff68e199f391b128c3e2d6d

See more details on using hashes here.

File details

Details for the file hepstats-0.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: hepstats-0.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 24.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.6.8

File hashes

Hashes for hepstats-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 139c695e81b6371f985ca1e43874649e25ac31215958b3cc356cc5246a041f00
MD5 9161dbaab19545881ada83ae79fef3a6
BLAKE2b-256 faff5969be7b5462b65f038538ef9d593f413c321fcbb7cbc1b74a71d628f8b5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page