Skip to main content

Numba-accelerated implementations of common probability distributions

Project description

numba-stats

We provide numba-accelerated implementations of statistical functions for common probability distributions

  • Uniform
  • (Truncated) Normal
  • Log-normal
  • Poisson
  • (Truncated) Exponential
  • Student's t
  • Voigtian
  • Crystal Ball
  • Generalised double-sided Crystal Ball
  • Tsallis-Hagedorn, a model for the minimum bias pT distribution
  • Q-Gaussian
  • Bernstein density (not normalized to unity, use this in extended likelihood fits)
  • Cruijff density (not normalized to unity, use this in extended likelihood fits)

with more to come. The speed gains are huge, up to a factor of 100 compared to scipy. Benchmarks are included in the repository and are run by pytest.

Usage

Each distribution is implemented in a submodule. Import the submodule that you need.

from numba_stats import norm
import numpy as np

x = np.linspace(-10, 10)
mu = 2
sigma = 3

p = norm.pdf(x, mu, sigma)
c = norm.cdf(x, mu, sigma)

The functions are vectorized on the variate x, but not on the shape parameters of the distribution. Ideally, the following functions are implemented for each distribution:

  • logpdf
  • pdf
  • cdf
  • ppf

cdf and ppf are missing for some distributions (e.g. voigt), if there is currently no fast implementation available. logpdf is only implemented if it is more efficient and accurate compared to computing log(dist.pdf(...)).

The distributions in numba_stats can be used in other numba-JIT'ed functions. The functions in numba_stats use a single thread, but the implementations were written so that they profit from auto-parallelization. To enable this, call them from a JIT'ed function with the argument parallel=True,fastmath=True. You should always combine parallel=True with fastmath=True, since the latter enhances the gain from auto-parallelization.

from numba_stats import norm
import numba as nb
import numpy as np

@nb.njit(parallel=True, fastmath=True)
def norm_pdf(x, mu, sigma):
  return norm.pdf(x, mu, sigma)

x = np.linspace(-10, 10)
mu = 2
sigma = 3

# uses all your CPU cores
p = norm_pdf(x, mu, sigma)

Note that this is only faster if x has sufficient length (about 1000 elements or more). Otherwise, the parallelization overhead will make the call slower, see benchmarks below.

Benchmarks

The following benchmarks were produced on an Intel(R) Core(TM) i7-8569U CPU @ 2.80GHz against SciPy-1.10.1. The dotted line on the right-hand figure shows the expected speedup (4x) from parallelization on a CPU with four physical cores.

We see large speed-ups with respect to scipy for almost all distributions. Also calls with short arrays profit from numba_stats, due to the reduced call-overhead. The functions voigt.pdf and t.ppf do not run faster than the scipy versions, because we call the respective scipy implementation written in FORTRAN. The advantage provided by numba_stats here is that you can call these functions from other numba-JIT'ed functions, which is not possible with the scipy implementations, and voigt.pdf still profits from auto-parallelization.

The bernstein.density does not profit from auto-parallelization, on the contrary it becomes much slower, so this should be avoided. This is a known issue, the internal implementation cannot be easily auto-parallelized.

Documentation

To get documentation, please use help() in the Python interpreter.

Functions with equivalents in scipy.stats follow the scipy calling conventions exactly, except for distributions starting with trunc..., which follow a different convention, since the scipy behavior is very impractical. Even so, note that the scipy conventions are sometimes a bit unusual, particular in case of the exponential, the log-normal, and the uniform distribution. See the scipy docs for details.

Contributions

You can help with adding more distributions, patches are very welcome. Implementing a probability distribution is easy. You need to write it in simple Python that numba can understand. Special functions from scipy.special can be used after some wrapping, see submodule numba_stats._special.py how it is done.

numba-stats and numba-scipy

numba-scipy is the official package and repository for fast numba-accelerated scipy functions, are we reinventing the wheel?

Ideally, the functionality in this package should be in numba-scipy and we hope that eventually this will be case. In this package, we don't offer overloads for scipy functions and classes like numba-scipy does. This simplifies the implementation dramatically. numba-stats is intended as a temporary solution until fast statistical functions are included in numba-scipy. numba-stats currently does not depend on numba-scipy, only on numba and scipy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numba-stats-1.4.0.tar.gz (208.1 kB view details)

Uploaded Source

Built Distribution

numba_stats-1.4.0-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file numba-stats-1.4.0.tar.gz.

File metadata

  • Download URL: numba-stats-1.4.0.tar.gz
  • Upload date:
  • Size: 208.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for numba-stats-1.4.0.tar.gz
Algorithm Hash digest
SHA256 47ae79c24c3414fcbe7eaf5cc673d688dcff35ea284c9e380b8ad4edc8f27c8a
MD5 ef7695ff7c3f94268eb00570f3b098a8
BLAKE2b-256 c1bdcb8c6cbcc367af69c342fe894b686f6d0259356a950081091f60ab6973f9

See more details on using hashes here.

File details

Details for the file numba_stats-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: numba_stats-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for numba_stats-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e40f8910e701b2f43b9f10c9b3a8daaa27073e8924337805a7c0687ef6c0ec5e
MD5 f7727b0d62877de759ea09500d1c94cf
BLAKE2b-256 31f63e3d55441fd825f9bef09c386909047ba5a5917bb39c1ff117b042c6bb16

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page