Skip to main content

Probability distributions mimicking Distrbutions.jl

Project description

rvlib

Anyone who has used Distributions.jl will tell you how nice the interface is relative to the "exotic" (the most polite word we can think of) interface to distributions exposed by scipy.stats. Distributions.jl also brings better performance, particularly when its methods are used inside loops.

For these reason we've put together rvlib, which mimics the interface of Distributions.jl, while at the same time attaining similar performance by exploiting numba.

This package was inspired by Joshua Adelman's (@synapticarbors) blog post describing how to call the Rmath C library from numba using CFFI, and utilizes his build script to set up the CFFI interface.

Objectives

  • Follow the API of the Distributions.jl package as closely as possible

  • Create a python package that has better performance than scipy.stats.

Methodology

All the classes are marked for optimization using the @jitclass decorator. As a result, instances of different distributions can be called within user specific routines or passed as arguments in nopython mode using numba.

The evaluation and sampling methods are built on the Rmath C library -- also used by the Distributions.jl package.

Distributions currently implemented

Univariate continuous:

  • Normal
  • Chisq
  • Uniform
  • T
  • Log-normal
  • F
  • Beta
  • Gamma
  • Exponential
  • Cauchy
  • Logistic
  • Weibull

Univariate discrete:

  • Poisson
  • Geometric
  • Hypergeometric
  • Binomial
  • Negative Binomial

Multivariate continuous:

  • check for updates on mulitvariate normal in multivariate branch

Functionality

The following properties are shared by all the univariate distributions:

  • params: tuple of the distribution's parameters
  • location: the location of the distribution (if exists)
  • scale: the scale of the distribution (if exists)
  • shape: the shape of the distribution (if exists)
  • mean: the mean of the distribution
  • median: the median of the distribution
  • mode: the mode of the distribution
  • var: the variance of the distribution
  • std: the standard deviation of the distribution
  • skewness: the skewness of the distribution
  • kurtosis: the kurtosis of the distribution
  • isplatykurtic: boolean indicating if kurtosis is greater than zero
  • isleptokurtic: boolean indicating if kurtosis is less than zero
  • ismesokurtic: boolean indicating if kurtosis is equal to zero
  • entropy: the entropy of the distribution

The following methods can be called for all univariate distributions:

  • mgf: evaluate the moment generating function (if exists)
  • cf: evaluate the characteristic function (if exists)
  • pdf: evaluate the probability density function
  • logpdf: evaluate the logarithm of the prabability density function
  • loglikelihood: evaluate the log-likelihood of the distribution with respect to all samples contained in array x
  • cdf: evaluate the cumulative distribution function
  • ccdf: evaluate the complementary cdf, i.e. (1 - cdf)
  • logcdf: evaluate the logarithm of the cdf
  • logccdf: evaluate the logarithm of the complementary cdf
  • quantile: evaluate the quantile function at a critical value
  • cquantile: evaluate the complementary quantile function
  • invlogcdf: evaluate the inverse function of the logcdf
  • invlogccdf: evaluate the inverse function of the logccdf
  • rand: generate array of independent random draws

Seed setting

As the package is built around the Rmath library the seed for the random number generator has to be set using the Rmath set_seed(x,y) function. For example:

import rvlib as rl

rl.set_seed(123, 456) # note that it requires two arguments

Use and Performance

Preliminary comparison with the scipy.stats package.

from rvlib import Normal
from scipy.stats import norm
import numpy as np
import timeit

N_dist = Normal(0,1) # rvlib version
N_scipy = norm(0,1) # scipy.stats version

x = np.linspace(0,100,100)
In [1]: %timeit N_dist.pdf(x)
Out[1]: The slowest run took 8.85 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 9.69 µs per loop
    
In [2]: %timeit N_scipy.pdf(x)
Out[2]: 10000 loops, best of 3: 150 µs per loop
In [3]: %timeit N_dist.cdf(x)
Out[3]: The slowest run took 20325.82 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 8.08 µs per loop

In [4]: %timeit N_scipy.cdf(x)
Out[4]:The slowest run took 190.64 times longer than the fastest. This could mean that an intermediate result is being cached.
    10000 loops, best of 3: 126 µs per loop
In [5]: %timeit N_dist.rand(1000)
Out[5]: The slowest run took 2166.80 times longer than the fastest. This could mean that an intermediate result is being cached.
    10000 loops, best of 3: 85.8 µs per loop
    
In [6]: %timeit N_scipy.rvs(1000)
Out[6]: 10000 loops, best of 3: 119 µs per loop

Contributors


This is a fork of the Rmath-julia library, with Python support added.

The original readme of the Rmath-julia repository is included below.


Rmath-julia

This is the Rmath library from R, which is used mainly by Julia's Distributions.jl package.

The main difference here is that this library has been patched to use the DSFMT RNG in src/runif.c.

The Julia RNG is in sync with the one used by the Distributions.jl package:

julia> srand(1);

julia> [rand(), rand()]
2-element Array{Float64,1}:
 0.236033
 0.346517

julia> srand(1);

julia> using Distributions

julia> [rand(Uniform()), rand(Uniform())]
2-element Array{Float64,1}:
 0.236033
 0.346517

Build instructions

Rmath-julia requires GNU Make (https://www.gnu.org/software/make). Just run make to compile the library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rvlib-0.0.6.tar.gz (183.2 kB view details)

Uploaded Source

File details

Details for the file rvlib-0.0.6.tar.gz.

File metadata

  • Download URL: rvlib-0.0.6.tar.gz
  • Upload date:
  • Size: 183.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for rvlib-0.0.6.tar.gz
Algorithm Hash digest
SHA256 05a4bbb74386ed083900029a8ef6f19d913d8e02dc997b409f4097f731d6918f
MD5 c88ff838ebce8466753229745b0e3ad3
BLAKE2b-256 d0efc82dbf8d5f6df698303e163d1ea2c4854475939266b3d67ea7cab53f3382

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page