design and steer profile likelihood fits

These details have not been verified by PyPI

Project links

Homepage

Project description

cabinetry

Introduction
Hello world
Template fits
Scope
Code
Acknowledgements

Introduction

cabinetry is a Python package to build and steer (profile likelihood) template fits with applications in high energy physics in mind. It acts as an interface to many powerful tools to make it easier for an analyzer to run their statistical inference pipeline. An incomplete list of interesting tools to interface:

ServiceX for data delivery,
coffea for histogram processing,
uproot for reading ROOT files,
for building likelihood functions (captured in so-called workspaces in RooFit) and inference:
- RooFit to model probability distributions,
- RooStats for statistical tools,
- HistFactory to implement a subset of binned template fits,
- pyhf for a pythonic take on HistFactory,
- zfit for a pythonic take on RooFit,
- MadMiner for likelihood-free inference techniques (see Scope).

The project is a work in progress. Configuration of cabinetry happens in a declarative manner, and is easily serializable via JSON/YAML into a configuration file.

Interesting related projects:

Hello world

To run the following example, first generate the input files via the script util/create_ntuples.py.

import cabinetry

cabinetry_config = cabinetry.configuration.load("config_example.yml")

# create template histograms
cabinetry.template_builder.create_histograms(cabinetry_config)

# perform histogram post-processing
cabinetry.template_postprocessor.run(cabinetry_config)

# visualize templates and data
cabinetry.visualize.data_MC_from_histograms(cabinetry_config, "figures/")

# build a workspace
ws = cabinetry.workspace.build(cabinetry_config)

# run a fit
cabinetry.fit.fit(ws)

The above is an abbreviated version of an example included in example.py, which shows how to use cabinetry. It requires additional libraries beyond the core dependencies of cabinetry, which can be installed via pip install cabinetry[contrib] (or pip install -e .[contrib] from the repository). Eventually the basic implementation (from cabinetry/contrib) will be replaced by calls to external modules (see also Code).

Template fits

The operations needed in a template fit workflow can be summarized as follows:

Template histogram production,
Histogram adjustments,
Workspace creation from histograms,
Inference from workspace,
Visualization.

While the first four points need to happen in this order (as each step uses as input the output of the previous step), the visualization is relevant at all stages to not only show final results, but also intermediate steps.

1. Template histogram production

The production of a template histogram requires the following information:

where to find the data (and how to read it),
what kind of selection requirements (filtering) and weights to apply to the data,
the variable to bin in, and what bins to use (for binned fits)
a unique name (key) for this histogram to be able to refer to it later on.

In practice, histogram information can be given by specifying lists of:

regions of phase space (or channels, independent regions obtained via different selection requirements),
samples (physics processes),
systematic uncertainties for the samples, which might vary across samples and phase space regions.

For LHC-style template profile likelihood fits, typically a few thousand histograms are needed. An analysis that considers 5 different phase space regions, with 10 different physics processes (simulated as 10 independent Monte Carlo samples), and an average of 50 systematic uncertainties for all the samples (implemented by specifying variations from the nominal configuration in two directions), needs 5x10x100=5000 histograms.

2. Histogram adjustments

Histogram post-processing can include re-binning, smoothing, or symmetrization of systematic uncertainties. These operations should be handled by tools outside of cabinetry. Such tools might either need some additional steering via an additional configuration, or the basic configuration file has to support arbitrary settings to be passed to these tools (depending on what each tool can interpret).

3. Workspace creation from histograms

Taking the example of pyhf, the workspace creation consists of plugging histograms into the right places in a JSON file. This can be relatively straightforward if the configuration file is very explicit about these assignments. In practice, it is desirable to support convenience options in the configuration file. An example is the ability to de-correlate the effect of a systematic uncertainty across different phase space regions via a simple flag. This means that instead of one nuisance parameter, many nuisance parameters need to be created automatically. The treatment can become complicated when many such convenience functions interact with each other.

A possible approach is to define a lowest level configuration file format that supports no convenience functions at all and everything specified in a very explicit manner. Convenience functions could be supported in small frameworks that can read configuration files containing flags for convenience functions, and those small frameworks could convert the configuration file into the low level format.

The basic task of building the workspace should have well-defined inputs (low-level configuration file) and outputs (such as HistFactory workspaces). Support of convenience functions can be factored out, with a well-defined output (low-level configuration file) and input given by an enhanced configuration file format.

4. Inference from workspace

Inference happens via fits of the workspace, to obtain best-fit results and uncertainties, limits on parameters, significances of observations and so on. External tools are called to perform inference, configured as specified by the configuration file.

5. Visualization

Some information of relevant kinds of visualization is provided in as-user-facing/fit-visualization.md and the links therein.

Scope

For now, cabinetry is focused on HistFactory style template fit models. Those traditional binned template fits are substantially easier to support than the open world of binned and unbinned models. Likelihood-free inference approaches in the style of MadMiner have a more well-defined scope than the open world of RooFit, and might be easier to integrate.

Code

Everything in cabinetry/contrib are basic implementation of tasks that should be done by other tools, and interfaces to those tools should be added. The basic implementations that exist there help with API design.

Acknowledgements

This work was supported by the U.S. National Science Foundation (NSF) cooperative agreement OAC-1836650 (IRIS-HEP).

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.6.0

Sep 19, 2023

0.5.2

Apr 1, 2023

0.5.1

Oct 31, 2022

0.5.0

Sep 25, 2022

0.4.1

Feb 11, 2022

0.4.0

Oct 13, 2021

0.3.0

Sep 8, 2021

0.2.3

Jul 4, 2021

0.2.2

Jun 30, 2021

0.2.1

May 7, 2021

0.2.0

Feb 17, 2021

0.1.8

Jan 20, 2021

0.1.7

Dec 14, 2020

0.1.6

Dec 11, 2020

0.1.5

Oct 21, 2020

This version

0.1.4

Sep 29, 2020

0.1.3

Sep 18, 2020

0.1.2

Sep 8, 2020

0.1.1

Aug 10, 2020

0.1.0

Aug 9, 2020

0.0.5

Jul 31, 2020

0.0.4

Jul 6, 2020

0.0.3

Jun 17, 2020

0.0.2

May 10, 2020

0.0.1

May 7, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cabinetry-0.1.4.tar.gz (43.0 kB view details)

Uploaded Sep 29, 2020 Source

Built Distribution

cabinetry-0.1.4-py3-none-any.whl (45.9 kB view details)

Uploaded Sep 29, 2020 Python 3

File details

Details for the file cabinetry-0.1.4.tar.gz.

File metadata

Download URL: cabinetry-0.1.4.tar.gz
Upload date: Sep 29, 2020
Size: 43.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for cabinetry-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`b7347db4d4e9bb3795afd0f738c9af0ce855f14b1570a56f42184ae50b9c4617`
MD5	`b08e57142f058c3b770b2eede086fb3d`
BLAKE2b-256	`14d8360637cc81cd8a7b21f12ecd98db4af0d99e2514194fd706e9ccb2f3d901`

See more details on using hashes here.

File details

Details for the file cabinetry-0.1.4-py3-none-any.whl.

File metadata

Download URL: cabinetry-0.1.4-py3-none-any.whl
Upload date: Sep 29, 2020
Size: 45.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for cabinetry-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ea967594fb476e160e3b4990b11acef6b0c41cf6866100e165cc4daa65a8702`
MD5	`5922cbeff119cd9b8fe568e3a4a0eb7c`
BLAKE2b-256	`7c2c263fa038ee44cad8f7e912811c46591875428dc2b30bded403a301feb331`