Skip to main content

Probability distributions mimicking Distrbutions.jl

Project description

rvlib

Anyone who has used Distributions.jl will tell you how nice the interface is relative to the "exotic" (the most polite word we can think of) interface to distributions exposed by scipy.stats. Distributions.jl also brings better performace, particularly when its methods are used inside loops.

For these reason we've put together rvlib, which mimics the interface of Distributions.jl, while at the same time attaining similar performance by exploiting numba.

This package was inspired by Joshua Adelman's (@synapticarbors) blog post describing how to call the Rmath C library from numba using CFFI, and utilizes his build script to set up the CFFI interface.

Objectives

  • Follow the API of the Distributions.jl package as closely as possible

  • Create a python package that has better performance than scipy.stats.

Methodology

All the classes are marked for optimization using the @jitclass decorator. As a result, instances of different distributions can be called within user specific routines or passed as arguments in nopython mode using numba.

The evaluation and sampling methods are built on the Rmath C library -- also used by the Distributions.jl package.

Distributions currently implemented

Univariate continuous:

  • Normal
  • Chisq
  • Uniform
  • T
  • Log-normal
  • F
  • Beta
  • Gamma
  • Exponential
  • Cauchy
  • Logistic
  • Weibull

Univariate discrete:

  • Poisson
  • Geometric
  • Hypergeometric
  • Binomial
  • Negative Binomial

Multivariate continuous:

  • check for updates on mulitvariate normal in multivariate branch

Functionality

The following properties are shared by all the univariate distributions:

  • params: tuple of the distribution's parameters
  • location: the location of the distribution (if exists)
  • scale: the scale of the distribution (if exists)
  • shape: the shape of the distribution (if exists)
  • mean: the mean of the distribution
  • median: the median of the distribution
  • mode: the mode of the distribution
  • var: the variance of the distribution
  • std: the standard deviation of the distribution
  • skewness: the skewness of the distribution
  • kurtosis: the kurtosis of the distribution
  • isplatykurtic: boolean indicating if kurtosis is greater than zero
  • isleptokurtic: boolean indicating if kurtosis is less than zero
  • ismesokurtic: boolean indicating if kurtosis is equal to zero
  • entropy: the entropy of the distribution

The following methods can be called for all univariate distributions:

  • mgf: evaluate the moment generating function (if exists)
  • cf: evaluate the characteristic function (if exists)
  • pdf: evaluate the probability density function
  • logpdf: evaluate the logarithm of the prabability density function
  • loglikelihood: evaluate the log-likelihood of the distribution with respect to all samples contained in array x
  • cdf: evaluate the cumulative distribution function
  • ccdf: evaluate the complementary cdf, i.e. (1 - cdf)
  • logcdf: evaluate the logarithm of the cdf
  • logccdf: evaluate the logarithm of the complementary cdf
  • quantile: evaluate the quantile function at a critical value
  • cquantile: evaluate the complementary quantile function
  • invlogcdf: evaluate the inverse function of the logcdf
  • invlogccdf: evaluate the inverse function of the logccdf
  • rand: generate array of independent random draws

Seed setting

As the package is built around the Rmath library the seed for the random number generator has to be set using the Rmath set_seed(x,y) function. For example:

import rvlib as rl

rl.set_seed(123, 456) # note that it requires two arguments

Use and Performance

Preliminary comparison with the scipy.stats package.

from rvlib import Normal
from scipy.stats import norm
import numpy as np
import timeit

N_dist = Normal(0,1) # rvlib version
N_scipy = norm(0,1) # scipy.stats version

x = np.linspace(0,100,100)
In [1]: %timeit N_dist.pdf(x)
Out[1]: The slowest run took 8.85 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 9.69 µs per loop
    
In [2]: %timeit N_scipy.pdf(x)
Out[2]: 10000 loops, best of 3: 150 µs per loop
In [3]: %timeit N_dist.cdf(x)
Out[3]: The slowest run took 20325.82 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 8.08 µs per loop

In [4]: %timeit N_scipy.cdf(x)
Out[4]:The slowest run took 190.64 times longer than the fastest. This could mean that an intermediate result is being cached.
    10000 loops, best of 3: 126 µs per loop
In [5]: %timeit N_dist.rand(1000)
Out[5]: The slowest run took 2166.80 times longer than the fastest. This could mean that an intermediate result is being cached.
    10000 loops, best of 3: 85.8 µs per loop
    
In [6]: %timeit N_scipy.rvs(1000)
Out[6]: 10000 loops, best of 3: 119 µs per loop

Contributors


This is a fork of the Rmath-julia library, with Python support added.

The original readme of the Rmath-julia repository is included below.


Rmath-julia

This is the Rmath library from R, which is used mainly by Julia's Distributions.jl package.

The main difference here is that this library has been patched to use the DSFMT RNG in src/runif.c.

The Julia RNG is in sync with the one used by the Distributions.jl package:

julia> srand(1);

julia> [rand(), rand()]
2-element Array{Float64,1}:
 0.236033
 0.346517

julia> srand(1);

julia> using Distributions

julia> [rand(Uniform()), rand(Uniform())]
2-element Array{Float64,1}:
 0.236033
 0.346517

Build instructions

Rmath-julia requires GNU Make (https://www.gnu.org/software/make). Just run make to compile the library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rvlib-0.0.4.tar.gz (183.8 kB view details)

Uploaded Source

Built Distributions

rvlib-0.0.4-py3.7-macosx-10.9-x86_64.egg (137.7 kB view details)

Uploaded Source

rvlib-0.0.4-cp37-cp37m-macosx_10_9_x86_64.whl (138.1 kB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

File details

Details for the file rvlib-0.0.4.tar.gz.

File metadata

  • Download URL: rvlib-0.0.4.tar.gz
  • Upload date:
  • Size: 183.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 importlib_metadata/4.6.3 pkginfo/1.6.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.10

File hashes

Hashes for rvlib-0.0.4.tar.gz
Algorithm Hash digest
SHA256 91e0baae3bed1d977bb590baaca36eebde4e2d52cf58862701022d1f9844cabe
MD5 ed469b5f4f278022f1a13c3d0cec6c6b
BLAKE2b-256 cd8aa3c60bf179975971eec6ec987fce3b382dcefd6bae7faf7422932ee8dd44

See more details on using hashes here.

File details

Details for the file rvlib-0.0.4-py3.7-macosx-10.9-x86_64.egg.

File metadata

  • Download URL: rvlib-0.0.4-py3.7-macosx-10.9-x86_64.egg
  • Upload date:
  • Size: 137.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 importlib_metadata/4.6.3 pkginfo/1.6.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.10

File hashes

Hashes for rvlib-0.0.4-py3.7-macosx-10.9-x86_64.egg
Algorithm Hash digest
SHA256 99905dfc2f4d683d03c3b3a89307f32cdee89405f92d98976d3cba695afff51e
MD5 ca3626e61bf737665d57552ba061a6e9
BLAKE2b-256 786e48ab3be742bbd6861a0a7ace77d2f51c9028a396766ab3a5653fc99ddfe8

See more details on using hashes here.

File details

Details for the file rvlib-0.0.4-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: rvlib-0.0.4-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 138.1 kB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 importlib_metadata/4.6.3 pkginfo/1.6.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.10

File hashes

Hashes for rvlib-0.0.4-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 826adfc46b1267ed9f0ff6ea7e1ced5142d1b727481ee4275149857b18258074
MD5 509d8875824238f177611fa06f1d3330
BLAKE2b-256 112cd28ef1f878ff8d410fde5f259aac846ae713d9778d2f8337bfbc8d69ee61

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page