Probability distributions mimicking Distrbutions.jl
Project description
rvlib
Anyone who has used Distributions.jl
will tell
you how nice the interface is relative to the "exotic" (the most polite word
we can think of) interface to distributions exposed by
scipy.stats.
Distributions.jl
also brings better performance, particularly when its
methods are used inside loops.
For these reason we've put together rvlib
, which mimics the
interface of Distributions.jl
, while at the same
time attaining similar performance by exploiting numba
.
This package was inspired by Joshua Adelman's (@synapticarbors) blog post describing how to call the Rmath C library from numba using CFFI, and utilizes his build script to set up the CFFI interface.
Objectives
-
Follow the API of the
Distributions.jl
package as closely as possible -
Create a python package that has better performance than
scipy.stats
.
Methodology
All the classes are marked for optimization using the @jitclass
decorator. As a result, instances of different distributions can be called within user specific routines or passed as arguments in nopython
mode using numba
.
The evaluation and sampling methods are built on the Rmath
C library -- also used by the Distributions.jl
package.
Distributions currently implemented
Univariate continuous:
- Normal
- Chisq
- Uniform
- T
- Log-normal
- F
- Beta
- Gamma
- Exponential
- Cauchy
- Logistic
- Weibull
Univariate discrete:
- Poisson
- Geometric
- Hypergeometric
- Binomial
- Negative Binomial
Multivariate continuous:
- check for updates on mulitvariate normal in
multivariate
branch
Functionality
The following properties are shared by all the univariate distributions:
params
: tuple of the distribution's parameterslocation
: the location of the distribution (if exists)scale
: the scale of the distribution (if exists)shape
: the shape of the distribution (if exists)mean
: the mean of the distributionmedian
: the median of the distributionmode
: the mode of the distributionvar
: the variance of the distributionstd
: the standard deviation of the distributionskewness
: the skewness of the distributionkurtosis
: the kurtosis of the distributionisplatykurtic
: boolean indicating if kurtosis is greater than zeroisleptokurtic
: boolean indicating if kurtosis is less than zeroismesokurtic
: boolean indicating if kurtosis is equal to zeroentropy
: the entropy of the distribution
The following methods can be called for all univariate distributions:
mgf
: evaluate the moment generating function (if exists)cf
: evaluate the characteristic function (if exists)pdf
: evaluate the probability density functionlogpdf
: evaluate the logarithm of the prabability density functionloglikelihood
: evaluate the log-likelihood of the distribution with respect to all samples contained in array xcdf
: evaluate the cumulative distribution functionccdf
: evaluate the complementary cdf, i.e. (1 - cdf)logcdf
: evaluate the logarithm of the cdflogccdf
: evaluate the logarithm of the complementary cdfquantile
: evaluate the quantile function at a critical valuecquantile
: evaluate the complementary quantile functioninvlogcdf
: evaluate the inverse function of the logcdfinvlogccdf
: evaluate the inverse function of the logccdfrand
: generate array of independent random draws
Seed setting
As the package is built around the Rmath
library the seed for the random number generator has to be set using the Rmath
set_seed(x,y)
function. For example:
import rvlib as rl
rl.set_seed(123, 456) # note that it requires two arguments
Use and Performance
Preliminary comparison with the scipy.stats
package.
from rvlib import Normal
from scipy.stats import norm
import numpy as np
import timeit
N_dist = Normal(0,1) # rvlib version
N_scipy = norm(0,1) # scipy.stats version
x = np.linspace(0,100,100)
In [1]: %timeit N_dist.pdf(x)
Out[1]: The slowest run took 8.85 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.69 µs per loop
In [2]: %timeit N_scipy.pdf(x)
Out[2]: 10000 loops, best of 3: 150 µs per loop
In [3]: %timeit N_dist.cdf(x)
Out[3]: The slowest run took 20325.82 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.08 µs per loop
In [4]: %timeit N_scipy.cdf(x)
Out[4]:The slowest run took 190.64 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 126 µs per loop
In [5]: %timeit N_dist.rand(1000)
Out[5]: The slowest run took 2166.80 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 85.8 µs per loop
In [6]: %timeit N_scipy.rvs(1000)
Out[6]: 10000 loops, best of 3: 119 µs per loop
Contributors
- Daniel Csaba (daniel.csaba@nyu.edu)
- Spencer Lyon (spencer.lyon@stern.nyu.edu)
This is a fork of the Rmath-julia library, with Python support added.
The original readme of the Rmath-julia repository is included below.
Rmath-julia
This is the Rmath library from R, which is used mainly by Julia's Distributions.jl package.
The main difference here is that this library has been patched to use
the DSFMT RNG
in src/runif.c
.
The Julia RNG is in sync with the one used by the Distributions.jl package:
julia> srand(1);
julia> [rand(), rand()]
2-element Array{Float64,1}:
0.236033
0.346517
julia> srand(1);
julia> using Distributions
julia> [rand(Uniform()), rand(Uniform())]
2-element Array{Float64,1}:
0.236033
0.346517
Build instructions
Rmath-julia requires GNU Make (https://www.gnu.org/software/make). Just run
make
to compile the library.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file rvlib-0.0.6.tar.gz
.
File metadata
- Download URL: rvlib-0.0.6.tar.gz
- Upload date:
- Size: 183.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05a4bbb74386ed083900029a8ef6f19d913d8e02dc997b409f4097f731d6918f |
|
MD5 | c88ff838ebce8466753229745b0e3ad3 |
|
BLAKE2b-256 | d0efc82dbf8d5f6df698303e163d1ea2c4854475939266b3d67ea7cab53f3382 |