Skip to main content

F.A.S.T. datacard creation package

Project description

fast-datacard

https://img.shields.io/pypi/v/fast-datacard.svg Documentation Status

Overview

fast-datacard is a python packaged developed within the Faster Analysis Software Taskforce (FAST) collaboration. The main purpose of this package is to create datacards compatible with the HiggsCombine tool from data frames. The package will take categoricalcitation needed data frames, e.g. as created by the alphatwirl package, and create the necessary ROOT and datacard outputs.

Features

  • convert categorical data frames (see examples/data/*.csv) into valid data to use in the HiggsCombine tool.

Usage

The usage is the following::

fast_datacard <yaml_config_file>

An example yaml config file is available: examples/datacards_config.yaml. The config file lists all the input event categories, regions, physics processes, dataframes, etc. A few things should be noted:

  • The existence of the general, regions, signals, backgrounds and systematics blocks is mandatory.

  • analysis_name, version, and dataset are just used for versioning.

  • The value of luminosity (float, in fb-1) is used to weight the signal and backgrounds content and error to the expected luminosity.

  • For each signal and background process named X, there should be a file in the path_to_dfs directory named X.csv (a whitespace-separated Pandas dataframe).

  • data_names_df should be equal to the process name used for data in the dataframe (Data in the example config file) and also should be the name of the .csv dataframe in path_to_dfs. data_names_dc will the name of the output data histogram and should be equal to data_obs as imposed by the HiggsCombine tool.

  • There has to be at least one signal and one background.

  • Backgrounds (but not signals, see below) can live only in specific region(s) (see example config file).

  • The systematics listed in the systematics block can have three types: lnN, lnU, and shape. The first two are normalization uncertainties and a value should be provided that corresponds to 1 + X, where X is the uncertainty one sigma level in percent (see example config file). For the shape type, no value is required as the shape itself encodes the uncertainty level. There is no need to specify Up/Down in the name of the uncertainty as this will be derived from the input dataframe (see below).

  • The systematics can apply only to a given set of signals and/or backgrounds, in which case the name of the process (identical to the one in the dataframe) should be specified. If the systematic applies to all backgrounds, backgrounds can be used instead of listing all the background processes (and the same is true for signals).

The configuration for running is also partly derived from the input dataframes, which formats should therefore follow a few rules:

  1. The columns should be named:

    process region category systematic variable variable_low variable_high content error

Where:

  • process is the name of the physics process, e.g. VBF, Ewk, etc.

  • region is the name of the region, e.g. Signal, ControlRegion1, etc.

  • category is the name of the event category, e.g. 2jet, highMass, etc. Each unique name will be considered as a different category.

  • systematic is the name of systematic shape variation that is applied to obtain the content of this row. E.g. if a process is characterized by two shape systematic uncertainties named syst1 and syst2, then the dataframe should contain 5 variations: nominal, syst1_Up, syst1_Down, syst2_Up, syst2_Down for each bin where this process exists.

  • variable is the name of variable that defines the x-values in the output histograms. It is not used by the code but is mainly there to keep track of the fit variables in different categories.

  • variable_low and variable_high define the binning along x in the output histograms used for the fit. Each unique set of (variable_low, variable_high) will be considered as a unique bin.

  • content is the yield for this specific (process, region, category, systematic, variable, variable_low, variable_high) bin.

  • error is the error assigned to the yield (please note it is not the square of the error! therefore for a Poisson experiment it should be sqrt(N).)

The use of region or category is optional in the sense that an analysis might contain only one region and one category; in this case, the value of each column needs to be filled by the same value for all rows.

  1. The signal(s) process(es) should be defined in all categories and regions, even if the content is 0. In other words, if you’re looking for an exotics signal named bananas, the code assumes it will find a row with bananas ‘s content for each bin of the analysis (i.e. the code never makes the assumption that the signal cannot live in the control regions as well).

  2. The data should be defined in all categories and regions, even if the content is 0. If data is not defined somewhere, the category/region shouldn’t even exist in the analysis.

The package will produce two sets of outputs:

  • Text datacards that summarize the physics processes, the yields, and meta-information about the analysis.

  • ROOT datacards that contain histrograms describing shapes that will be used in the fit.

Both serve as inputs to the HiggsCombine tool.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.3 (2018-04-05)

  • Easier handling of dataframe files

0.1.2 (2018-04-04)

  • Updated executable name and documentation

0.1.1 (2018-10-01)

  • added initial documentation

0.1.0 (2018-08-21)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast-datacard-0.1.3.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

fast_datacard-0.1.3-py2.py3-none-any.whl (12.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file fast-datacard-0.1.3.tar.gz.

File metadata

  • Download URL: fast-datacard-0.1.3.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for fast-datacard-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ff357dcbba4af0804f3dd2ecde3a4323a01772ee1a403f95de99c5bf45868223
MD5 5281771d5ff455bd282e3e8cb8695e11
BLAKE2b-256 155f0f8cf2f7b678622026e1fd06dcad0ba5874e2f11f96b52ff9ac9735fa916

See more details on using hashes here.

File details

Details for the file fast_datacard-0.1.3-py2.py3-none-any.whl.

File metadata

  • Download URL: fast_datacard-0.1.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for fast_datacard-0.1.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 d8ca1e3fb9314abef20b885b853dbe61ed3076f725919110ab9b858161222fd9
MD5 301d36553c8433e2a378506c09fb2bf2
BLAKE2b-256 5a53face463c055749a3f827b78c5efb31f464153192c529709800792fcb0de2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page