Skip to main content

F.A.S.T. datacard creation package

Project description

fast-datacard

https://img.shields.io/pypi/v/fast-datacard.svg Documentation Status

Overview

fast-datacard is a python packaged developed within the Faster Analysis Software Taskforce (FAST) collaboration. The main purpose of this package is to create datacards compatible with the HiggsCombine tool from data frames. The package will take categoricalcitation needed data frames, e.g. as created by the alphatwirl package, and create the necessary ROOT and datacard outputs.

Features

  • convert categorical data frames (see examples/data/*.csv) into valid data to use in the HiggsCombine tool.

Usage

The usage is the following::

fast_datacard <yaml_config_file>

An example yaml config file is available: examples/datacards_config.yaml. The config file lists all the input event categories, regions, physics processes, dataframes, etc. A few things should be noted:

  • The existence of the general, regions, signals, backgrounds and systematics blocks is mandatory.

  • data_names_df should be equal to the process name used for data in the dataframe (Data in the example config file). data_names_dc will the name of the output data histogram and should be equal to data_obs as imposed by the HiggsCombine tool.

  • There has to be at least one signal and one background.

  • analysis_name, version, and dataset are just used for versioning but the value of luminosity (float, in fb-1) is used to weight the signal and backgrounds content and error to the expected luminosity.

  • Backgrounds (but not signals, see below) can live only in specific region(s) (see example config file).

  • The systematics listed in the systematics block can have three types: lnN, lnU, and shape. The first two are normalization uncertainties and a value should be provided that corresponds to 1 + X, where X is the uncertainty one sigma level in percent (see example config file). For the shape type, no value is required as the shape itself encodes the uncertainty level. There is no need to specify Up/Down in the name of the uncertainty as this will be derived from the input dataframe (see below).

  • The systematics can apply only to a given set of signals and/or backgrounds, in which case the name of the process (identical to the one in the dataframe) should be specified. If the systematic applies to all backgrounds, backgrounds can be used instead of listing all the background processes (and the same is true for signals).

The configuration for running is also partly derived from the input dataframes, which formats should therefore follow a few rules:

  1. The columns should be named::

    process region category systematic variable variable_low variable_high content error

Where:

  • process is the name of the physics process, e.g. VBF, Ewk, etc.

  • region is the name of the region, e.g. Signal, ControlRegion1, etc.

  • category is the name of the event category, e.g. 2jet, highMass, etc. Each unique name will be considered as a different category.

  • systematic is the name of systematic shape variation that is applied to obtain the content of this row. E.g. if a process is characterized by two shape systematic uncertainties named syst1 and syst2, then the dataframe should contain 5 variations: nominal, syst1_Up, syst1_Down, syst2_Up, syst2_Down for each bin where this process exists.

  • variable is the name of variable that defines the x-values in the output histograms. It is not used by the code but is mainly there to keep track of the fit variables in different categories.

  • variable_low and variable_high define the binning along x in the output histograms used for the fit. Each unique set of (variable_low, variable_high) will be considered as a unique bin.

  • content is the yield for this specific (process, region, category, systematic, variable, variable_low, variable_high) bin.

  • error is the error assigned to the yield (please note it is not the square of the error! therefore for a Poisson experiment it should be sqrt(N).)

The use of region or category is optional in the sense that an analysis might contain only one region and one category; in this case, the value of each column needs to be filled by the same value for all rows.

  1. The signal(s) process(es) should be defined in all categories and regions, even if the content is 0. In other words, if you’re looking for an exotics signal named bananas, the code assumes it will find a row with bananas ‘s content for each bin of the analysis (i.e. the code never makes the assumption that the signal cannot live in the control regions as well).

  2. The data should be defined in all categories and regions, even if the content is 0. If data is not defined somewhere, the category/region shouldn’t even exist in the analysis.

The package will produce two sets of outputs:

  • Text datacards that summarize the physics processes, the yields, and meta-information about the analysis.

  • ROOT datacards that contain histrograms describing shapes that will be used in the fit.

Both serve as inputs to the HiggsCombine tool.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.2 (2018-04-04)

  • Updated executable name and documentation

0.1.1 (2018-10-01)

  • added initial documentation

0.1.0 (2018-08-21)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast-datacard-0.1.2.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

fast_datacard-0.1.2-py2.py3-none-any.whl (11.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file fast-datacard-0.1.2.tar.gz.

File metadata

  • Download URL: fast-datacard-0.1.2.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for fast-datacard-0.1.2.tar.gz
Algorithm Hash digest
SHA256 224e7d19c6aa20f48d8ed66e91fcc1492a8edfed7584297ab1ef08167bc16b26
MD5 32f3b0f4de822f17bdc714131eec6093
BLAKE2b-256 309e18a24107a899100330ac76adecbdf5c3641986b5daf1a80a17ab3842ec63

See more details on using hashes here.

File details

Details for the file fast_datacard-0.1.2-py2.py3-none-any.whl.

File metadata

  • Download URL: fast_datacard-0.1.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for fast_datacard-0.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b52a0459c39d7c1429bb640cb58c2639a601f4ac8cd9a341e5220f49de226bfd
MD5 f68afe02ab554c0aa935f28c4a888b3f
BLAKE2b-256 254cbc8b5e4bc7341ce977db3602ff74344da6c527d37e33ed81ff9200f411d8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page