dataprob

Do likelihood based parameter estimation using maximum likeihood and bayesian methods

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

harmsm

These details have not been verified by PyPI

Project description

dataprob was designed to allow scientists to easily fit user-defined models to experimental data. It allows maximum likelihood, bootstrap, and Bayesian analyses with a simple and consistent interface.

Design principles

ease of use: Users write a python function that describes their model, then load in their experimental data as a dataframe.
dataframe centric: Uses a pandas dataframe to specify parameter bounds, guesses, fixedness, and priors. Observed data can be passed in as a dataframe or numpy vector. All outputs are pandas dataframes.
consistent experience: Users can run maximum-likelihood, bootstrap resampling, or Bayesian MCMC analyses with an identical interface and nearly identical diagnostic outputs.
interpretable: Provides diagnostic plots and runs tests to validate fit results.

Simple example

The following code generates noisy linear data and uses dataprob to find the maximum likelihood estimate of its slope and intercept. Run on Google Colab.

import dataprob
import numpy as np

# Generate "experimental" linear data (slope = 5, intercept = 5.7) that has
# random noise on each point.
x_array = np.linspace(0,10,25)
noise = np.random.normal(loc=0,scale=0.5,size=x_array.shape)
y_obs = 5*x_array + 5.7 + noise

# 1. Define a linear model
def linear_model(m=1,b=1,x=[]):
    return m*x + b

# 2. Set up the analysis. 'method' can be "ml", "mcmc", or "bootstrap"
f = dataprob.setup(linear_model,
                   method="ml",
                   non_fit_kwargs={"x":x_array})

# 3. Fit the parameters of linear_model model to y_obs, assuming uncertainty
#    of 0.5 on each observed point.
f.fit(y_obs=y_obs,
      y_std=0.5)

# 4. Access results
fig = dataprob.plot_summary(f)
fig = dataprob.plot_corner(f)
print(f.fit_df)
print(f.fit_quality)

The plots will be:

The f.fit_df dataframe will look something like:

index	name	estimate	std	low_95	high_95	…	prior_std
m	m	5.009	0.045	4.817	5.202	…	NaN
b	b	5.644	0.274	4.465	6.822	…	NaN

The f.fit_quality dataframe will look something like:

name	description	is_good	value
num_obs	number of observations	True	25.000
num_param	number of fit parameters	True	2.000
lnL	log likelihood	True	-18.761
chi2	chi^2 goodness-of-fit	True	0.241
reduced_chi2	reduced chi^2	True	1.192
mean0_resid	t-test for residual mean != 0	True	1.000
durbin-watson	Durbin-Watson test for correlated residuals	True	2.265
ljung-box	Ljung-Box test for correlated residuals	True	0.943

Installation

We recommend installing dataprob with pip:

pip install dataprob

To install from source and run tests:

git clone https://github.com/harmslab/dataprob.git
cd dataprob
pip install .

# to run test-suite
pytest --runslow

Examples

A good way to learn how to use the library is by working through examples. The following notebooks are included in the dataprob/examples/ directory. They are self-contained demonstrations in which dataprob is used to analyze various classes of experimental data. The links below launch each notebook in Google Colab:

api-example.ipynb: shows various features of the API when analyzing a linear model
linear.ipynb: fit a linear model to noisy data (2 parameter, linear)
binding.ipynb: a single-site binding interaction (2 parameter, sigmoidal curve)
michaelis-menten.ipynb: Michaelis-Menten model of enzyme kinetics (2 parameter, sigmoidal curve)
lagged-exponential.ipynb: bacterial growth curve with initial lag phase (3 parameter, exponential)
multi-gaussian.ipynb: two overlapping normal distributions (6 parameter, Gaussian)
periodic.ipynb: periodic data (3 parameter, sine)
polynomial.ipynb: nonlinear data with no obvious form (5 parameter, polynomial)
linear-extrapolation-folding.ipynb: protein equilibrium unfolding data (6 parameter, linear embedded in sigmoidal)

Documentation

Full documentation is on readthedocs.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

harmsm

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.9.4

Sep 18, 2024

0.9.3

Sep 6, 2024

This version

0.9.2

Sep 4, 2024

0.9.1

Sep 2, 2024

0.9.0

Aug 27, 2024

0.2.0

Aug 5, 2024

0.1.1

Jul 23, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataprob-0.9.2.tar.gz (49.4 kB view hashes)

Uploaded Sep 4, 2024 Source

Built Distribution

dataprob-0.9.2-py2.py3-none-any.whl (61.6 kB view hashes)

Uploaded Sep 4, 2024 Python 2 Python 3

Hashes for dataprob-0.9.2.tar.gz

Hashes for dataprob-0.9.2.tar.gz
Algorithm	Hash digest
SHA256	`ae8dfe92dacc017b695f66fa531547452194a9c9ddc6822558b40bfb91c6e08a`
MD5	`3d73bd946e7a18eb266e0071182600ff`
BLAKE2b-256	`932d7d6b2121fcda5fe7eb66e4ccb46cc71516353e9f0c7676146240ac3b60a3`

Hashes for dataprob-0.9.2-py2.py3-none-any.whl

Hashes for dataprob-0.9.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`147241cd2c9b9c55d0e56328e5b1a5a5ec57d76f264def146e378ef40b079eb7`
MD5	`9a3125a803853209d6b51d7b8df28396`
BLAKE2b-256	`3f6d73456716e807665ed1badf21e8dfab37fe68fa98323a19e7c0e02ef80a3a`