Skip to main content

The Inference Gym is the place to exercise inference methods to help make them faster, leaner and more robust.

Project description

Inference Gym

Overview

The Inference Gym is the place to exercise inference methods to help make them faster, leaner and more robust. The goal of the Inference Gym is to provide a set of probabilistic inference problems with a standardized interface, making it easy to test new inference techniques across a variety of challenging tasks.

Currently it provides a repository of probabilistic models that can be used to benchmark (the computational and statistical performance of) inference algorithms. Probabilistic models are implemented as subclasses of the Model class, which minimally provides the following faculties:

  • A description of the shapes and dtypes of the parameters of the model.
  • Event space bijectors which map from the unconstrained real space, to the support of the model's associated density.
  • Ability to compute the log un-normalized density at a certain parameter setting.
  • Name of the model.
  • Sample transformations, which when applied to samples from the model's density represent quantities with a useful interpretation.

Each model can additionally provide:

  • Ground truth quantities associated with each sample transformation. This can include mean, variance and other statistics. If these are estimated via Monte-Carlo methods, a standard error is also provided. This can be used to verify the algorithm's level of bias.

Usage

pip install tfp-nightly inference_gym
# Install at least one the folowing
pip install tf-nightly  # For the TensorFlow backend.
pip install jax jaxlib  # For the JAX backend.
# Install to support external datasets
pip install tfds-nightly
import matplotlib.pyplot as plt
import numpy as np
from inference_gym import using_tensorflow as
inference_gym

model = inference_gym.targets.GermanCreditNumericLogisticRegression()

samples = inference_method(
  model.unnormalized_log_prob,
  model.default_event_space_bijector,
  model.event_shape,
  model.dtype)

plt.figure()
plt.suptitle(str(model))  # 'German Credit Numeric Logistic Regression'
for i, (name, sample_transformation) in enumerate(
    model.sample_transformations.items()):
  transformed_samples = sample_transformation(samples)
  bias_sq = tf.square(
      tf.reduce_mean(transformed_samples, 0) -
      sample_transformation.ground_truth_mean)
  ess = compute_ess(  # E.g. tfp.mcmc.effective_sample_size if using MCMC.
      transformed_samples,
      tf.square(sample_transformation.ground_truth_standard_deviation))
  plt.subplot(len(model.sample_transformations), 2, 2 * i + 1)
  plt.title('{} bias^2'.format(sample_transformation))  # e.g. 'Identity bias^2'
  plt.bar(np.arange(bias_sq.shape[-1]), bias_sq)
  plt.subplot(len(model.sample_transformations), 2, 2 * i + 2)
  plt.title('{} ess'.format(sample_transformation))
  plt.bar(np.arange(ess.shape[-1]), ess)

Also, see VectorModel which can be used to simplify the interface requirements for the inference method.

What makes for a good Inference Gym Model?

A good model should ideally do one or more of these:

  • Help build intuition (usually 1D or 2D for ease of visualization)
  • Represent a generally important application of Bayesian inference
  • Pose a challenge for inference, e.g.
    • high dimensionality
    • poor or pathological conditioning
    • mixing continuous and discrete latents
    • multimodality
    • non-identifiability
    • expensive gradients

Naturally, a model shouldn’t have all of those properties so users can more easily do experiments to tease out which complication has what effect on the inference procedure. This isn’t an exhaustive list.

Making changes

Adding a new model

It's easiest to mimic an existing example. Here's a small table to help you find an example. If your model isn't described well by these possibilities, feel free to ask for help.

Bayesian Model? Real dataset? Analytic Ground Truth? Stan Implementation? Multiple RVs? Example Model
Yes Real No Yes Yes GermanCreditNumericSparseLogicRegression
Yes Real No Yes No GermanCreditLogicRegression
Yes Synthetic No Yes Yes SyntheticItemResponseTheory
No None Yes No No IllConditionedGaussian

A Bayesian model in the table above refers to models whose density over the parameters is computed using the product of a prior and a likelihood function (i.e. using Bayes' theorem). These models should inherit from the BayesianModel class, as it provides some utilities for such models.

Currently we have a little tooling to help use cmdstanpy to generate ground truth values (in the correct format) for models without analytic ground truth. Using this requires adding a model implementation inside the inference_gym/tools/stan directory.

Adding a new real dataset

We strongly encourage you to add your dataset to TensorFlow Datasets first. Then, you can follow the example of the German Credit (numeric) dataset used in the GermanCreditLogicRegression.

Adding a new synthetic dataset

Follow the example of the SyntheticItemResponseTheory model.

Generating ground truth files.

See inference_gym/tools/get_ground_truth.py.

Citing Inference Gym

To cite the Inference Gym:

@software{inferencegym2020,
  author = {Pavel Sountsov and Alexey Radul and contributors},
  title = {Inference Gym},
  url = {https://pypi-hypernode.com/project/inference_gym},
  version = {0.0.2},
  year = {2020},
}

Make sure to update the version attribute to match the actual version you're using.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

inference_gym-0.0.2-py3-none-any.whl (285.8 kB view details)

Uploaded Python 3

File details

Details for the file inference_gym-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: inference_gym-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 285.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.0 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for inference_gym-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 10b40b3e43e66086af5d0416ac2bd1006d70b7a74cd683d8634b816b2b956d9d
MD5 245c53d1cf22d4d081f077e4dc1c2c67
BLAKE2b-256 64c852550bb0d4d8d65f24fab0457595afacd0b7c0ad853fd299211faa483ffa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page