Skip to main content

Front-end for the ServiceX Data Server

Project description

ServiceX_frontend

Client access library for ServiceX

GitHub Actions Status Code Coverage

PyPI version Supported Python versions

Introduction

Given you have a selection string, this library will manage submitting it to a ServiceX instance and retreiving the data locally for you. The selection string is often generated by another front-end library, for example:

  • func_adl.xAOD (for ATLAS xAOD's)
  • func_adl.XXX (for flat ntuples)
  • xxx for columns

These libraries are just coming up now, so this list is just an outline.

Prerequisites

Before you install this library you'll need:

  • An environment based on python 3.7 or later
  • A ServiceX end-point. For example, http://localhost:5000/servicex.

Usage

The following lines will return a pandas.DataFrame containing all the jet pT's from an ATLAS xAOD file containing Z->ee Monte Carlo:

    import servicex
    query = "(call ResultTTree (call Select (call SelectMany (call EventDataset (list 'localds:bogus')) (lambda (list e) (call (attr e 'Jets') 'AntiKt4EMTopoJets'))) (lambda (list j) (/ (call (attr j 'pt')) 1000.0))) (list 'JetPt') 'analysis' 'junk.root')"
    dataset = "mc15_13TeV:mc15_13TeV.361106.PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee.merge.DAOD_STDM3.e3601_s2576_s2132_r6630_r6264_p2363_tid05630052_00"
    r = servicex.get_data(query , dataset, servicex_endpoint=endpoint)
    print(r)

And the output in a terminal window from running the above script (takes about 1-2 minutes to complete):

python scripts\run_test.py http://localhost:5000/servicex
            JetPt
entry
0       38.065707
1       31.967096
2        7.881337
3        6.669581
4        5.624053
...           ...
710183  42.926141
710184  30.815709
710185   6.348002
710186   5.472711
710187   5.212714

[11355980 rows x 1 columns]

If your query is badly formed or there is an other problem with the backend, an exception will be thrown.

If you'd like to be able to submit multiple queries and have them run on the ServiceX back end in parallel, it may be best to use the asyncio interface, which has the identical signature, but is called get_data_async.

Features

Implemented:

  • Accepts a qastle formatted query
  • Exceptions are used to report back errors of all sorts from the service to the user's code.
  • Data is return as a pandas.DataFrame or a awkward array (see the data_type parameter)
  • Complete returned data must fit in the process' memory
  • Run in an async or a non-async environment and non-async methods will accomodate automatically (including jupyter notebooks).
  • Support up to 100 simultanious queries from a laptop-like front end without overwhelming the local machine (hopefully ServiceX will be overwhelmed!)
  • Start downloading files as soon as they are ready (before ServiceX is done with the complete transform).

Comming:

  • Data is returned as a list of ROOT files located in a specified directory
  • Make it easy to submit the same query for 100 different datasets

Testing

This code has been tested in several environments:

  • Windows, Linux, MacOS
  • Python 3.6, 3.7, 3.8
    • 3.8.0 and 3.8.1 only. Unfortunately, 3.8.2 has caused nest_asyncio to fail. Until that package is updated we are stuck at 3.8.1.
  • Jupyter Notebooks (not automated), regular python command-line invoked source files

Development

For any changes please feel free to submit pull requests!

To do development please setup your environment with the following steps:

  1. A python 3.7 development environment
  2. Pull down this package, XX
  3. python -m pip install -e .[test]
  4. Run the tests to make sure everything is good: pytest.

Then add tests as you develop. When you are done, submit a pull request with any required changes to the documentation and the online tests will run.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

servicex-1.0.0b1.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

servicex-1.0.0b1-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file servicex-1.0.0b1.tar.gz.

File metadata

  • Download URL: servicex-1.0.0b1.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for servicex-1.0.0b1.tar.gz
Algorithm Hash digest
SHA256 ebf7adaa9d8ff382172d6d61bea69b338a20135b424058e57ad5e029e0ea8705
MD5 50f50002d7b3d70084c3faf6d47e1ae7
BLAKE2b-256 3217dcb6c211e7c6dd231ab280448dfec26cd0b94647f32542e2376ba1d5e485

See more details on using hashes here.

File details

Details for the file servicex-1.0.0b1-py3-none-any.whl.

File metadata

  • Download URL: servicex-1.0.0b1-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for servicex-1.0.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 9e0644afe43d8bfe73c7af96676660d1db54867f140ffb4b31feedd7cc135854
MD5 64a1c9ada1681013e85d7575cd9c2a7a
BLAKE2b-256 ea2f19a1d5d8942ac05fe5320e5e54b1a2a354b14d47f917fde53d63a847efe1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page