Skip to main content

MLPerf Inference LoadGen python bindings

Project description

Overview {#mainpage}

Introduction

  • The LoadGen is a reusable module that efficiently and fairly measures the performance of inference systems.
  • It generates traffic for scenarios as formulated by a diverse set of experts in the MLCommons working group.
  • The scenarios emulate the workloads seen in mobile devices, autonomous vehicles, robotics, and cloud-based setups.
  • Although the LoadGen is not model or dataset aware, its strength is in its reusability with logic that is.

Integration Example and Flow

The following is an diagram of how the LoadGen can be integrated into an inference system, resembling how some of the MLPerf reference models are implemented.

  1. Benchmark knows the model, dataset, and preprocessing.
  2. Benchmark hands dataset sample IDs to LoadGen.
  3. LoadGen starts generating queries of sample IDs.
  4. Benchmark creates requests to backend.
  5. Result is post processed and forwarded to LoadGen.
  6. LoadGen outputs logs for analysis.

Useful Links

Scope of the LoadGen's Responsibilities

In Scope

  • Provide a reusable C++ library with python bindings.
  • Implement the traffic patterns of the MLPerf Inference scenarios and modes.
  • Record all traffic generated and received for later analysis and verification.
  • Summarize the results and whether performance constraints were met.
  • Target high-performance systems with efficient multi-thread friendly logging utilities.
  • Generate trust via a shared, well-tested, and community-hardened code base.

Out of Scope

The LoadGen is:

  • NOT aware of the ML model it is running against.
  • NOT aware of the data formats of the model's inputs and outputs.
  • NOT aware of how to score the accuracy of a model's outputs.
  • NOT aware of MLPerf rules regarding scenario-specific constraints.

Limitting the scope of the LoadGen in this way keeps it reusable across different models and datasets without modification. Using composition and dependency injection, the user can define their own model, datasets, and metrics.

Additionally, not hardcoding MLPerf-specific test constraints, like test duration and performance targets, allows users to use the LoadGen unmodified for custom testing and continuous integration purposes.

Submission Considerations

Upstream all local modifications

  • As a rule, no local modifications to the LoadGen's C++ library are allowed for submission.
  • Please upstream early and often to keep the playing field level.

Choose your TestSettings carefully!

  • Since the LoadGen is oblivious to the model, it can't enforce the MLPerf requirements for submission. e.g.: target percentiles and latencies.
  • For verification, the values in TestSettings are logged.
  • To help make sure your settings are spec compliant, use TestSettings::FromConfig in conjunction with the relevant config file provided with the reference models.

Responsibilities of a LoadGen User

Implement the Interfaces

  • Implement the SystemUnderTest and QuerySampleLibrary interfaces and pass them to the StartTest function.
  • Call QuerySampleComplete for every sample received by SystemUnderTest::IssueQuery.

Assess Accuracy

  • Process the mlperf_log_accuracy.json output by the LoadGen to determine the accuracy of your system.
  • For the official models, Python scripts will be provided by the MLPerf model owners for you to do this automatically.

For templates of how to do the above in detail, refer to code for the demos, tests, and reference models.

LoadGen over the Network

For reference, on a high level a submission looks like this:

The LoadGen implementation is common to all submissions, while the QSL (“Query Sample Library”) and SUT (“System Under Test”) are implemented by submitters. QSL is responsible for loading the data and includes untimed preprocessing.

A submission over the network introduces a new component “QDL” (query dispatch library) that is added to the system as presented in the following diagram:

QDL is a proxy for a load-balancer, that dispatches queries to SUT over a physical network, receives the responses and passes them back to LoadGen. It is implemented by the submitter. The interface of the QDL is the same as the API to SUT.

In scenarios using QDL, data may be compressed in QSL at the choice of the submitter in order to reduce network transmission time. Decompression is part of the timed processing in SUT. A set of approved standard compression schemes will be specified for each benchmark; additional compression schemes must be approved in advance by the Working Group.

All communication between LoadGen/QSL and SUT is via QDL, and all communication between QDL and SUT must pass over a physical network.

QDL implements the protocol to transmit queries over the network and receive responses. It also implements decompression of any response returned by the SUT, where compression of responses is allowed. Performing any part of the timed preprocessing or inference in QDL is specifically disallowed. Currently no batching is allowed in QDL, although this may be revisited in future.

The MLperf over the Network will run in Server mode and Offline mode. All LoadGen modes are expected to work as is with insignificant changes. These include running the test in performance mode, accuracy mode, find peak performance mode and compliance mode. The same applies for power measurements.

QDL details

The Query Dispatch Library is implemented by the submitter and interfaces with LoadGen using the same SUT API. All MLPerf Inference SUTs implement the mlperf::SystemUnderTest class which is defined in system_under_test.h. The QDL implements mlperf::QueryDispatchLibrary class which inherits the mlperf::SystemUnderTest class and has the same API and support all existing mlperf::SystemUnderTest methods. It has a separate header file query_dispatch_library.h. Using sut with mlperf::SystemUnderTest class in LoadGen StartTest is natively upcasting mlperf::QueryDispatchLibrary class.

QDL Query issue and response over the network

The QDL gets the queries from the LoadGen through

void IssueQuery(const std::vector<QuerySample>& samples)

The QDL dispatches the queries to the SUT over the physical media. The exact method and implementation for it are submitter specific and would not be specified at MLCommons. Submitter implementation includes all methods required to serialize the query, load balance, drive it to the Operating system and network interface card and send to the SUT.

The QDL receives the query responses over the network from the SUT. The exact method and implementation for it are submitter specific and would not be specified at MLCommons. The submitter implementation includes all methods required to receive the network data from the Network Interface card, go through the Operating system, deserialize the query response, and provide it back to the LoadGen through query completion by:

struct QuerySampleResponse {
  ResponseId id;
  uintptr_t data;
  size_t size;
};
void QuerySamplesComplete(QuerySampleResponse* responses, 
                          size_t response_count);

QDL Additional Methods

In addition to that the QDL needs to implement the following methods that are provided by the SUT interface to the LoadGen:

const std::string& Name();

The Name function returns a known string for over the Network SUTs to identify it as over the network benchmark.

void FlushQueries();

It is not specified here how the QDL would query and configure the SUT to execute the above methods. The QDL responds to the LoadGen after receiving its own response from the SUT.

Example

Refer to LON demo for a reference example illustrating usage of Loadgen over the network.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlcommons_loadgen-4.0.tar.gz (70.9 kB view details)

Uploaded Source

Built Distributions

mlcommons_loadgen-4.0-pp310-pypy310_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (415.8 kB view details)

Uploaded PyPy manylinux: glibc 2.27+ x86-64 manylinux: glibc 2.28+ x86-64

mlcommons_loadgen-4.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (449.8 kB view details)

Uploaded CPython 3.13 manylinux: glibc 2.27+ x86-64 manylinux: glibc 2.28+ x86-64

mlcommons_loadgen-4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (449.8 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.27+ x86-64 manylinux: glibc 2.28+ x86-64

mlcommons_loadgen-4.0-cp311-cp311-win_amd64.whl (294.6 kB view details)

Uploaded CPython 3.11 Windows x86-64

mlcommons_loadgen-4.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (446.6 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.27+ x86-64 manylinux: glibc 2.28+ x86-64

mlcommons_loadgen-4.0-cp311-cp311-macosx_14_0_arm64.whl (455.6 kB view details)

Uploaded CPython 3.11 macOS 14.0+ ARM64

mlcommons_loadgen-4.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (447.0 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.27+ x86-64 manylinux: glibc 2.28+ x86-64

mlcommons_loadgen-4.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (447.0 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.27+ x86-64 manylinux: glibc 2.28+ x86-64

mlcommons_loadgen-4.0-cp39-cp39-macosx_10_14_universal2.whl (923.5 kB view details)

Uploaded CPython 3.9 macOS 10.14+ universal2 (ARM64, x86-64)

mlcommons_loadgen-4.0-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (446.0 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.27+ x86-64 manylinux: glibc 2.28+ x86-64

mlcommons_loadgen-4.0-cp37-cp37m-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (446.2 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.27+ x86-64 manylinux: glibc 2.28+ x86-64

File details

Details for the file mlcommons_loadgen-4.0.tar.gz.

File metadata

  • Download URL: mlcommons_loadgen-4.0.tar.gz
  • Upload date:
  • Size: 70.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.6

File hashes

Hashes for mlcommons_loadgen-4.0.tar.gz
Algorithm Hash digest
SHA256 530b8ccb33d546ff9ab3e90f8860b61daf4de36e9af6745caec33da829d6e8b7
MD5 134be6a0d2bd2beb0cd016bb7b177bcb
BLAKE2b-256 40c7f6a8c2c873e74edd10378af25717ede6aab0307b2fde6ffda3ac2c931f68

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.0-pp310-pypy310_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.0-pp310-pypy310_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0986507920599608f709948134c60471dfca041ce2a1214f9f55e39bd2462ac9
MD5 e92f10aac137976168d70d036604620a
BLAKE2b-256 2b0576fe00c9c9abadd7e110d6b35d9ea611e008a413c4d6252fc1c4e5c4c7e8

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 70ae790859a8d051cf9c77933069b08ac999f367b0757449e12625f713782822
MD5 b8dab46544ec794d278655bf2fc06621
BLAKE2b-256 543feb9a0c50c570d8cc6e25010c99c3870e93313246cf14825ae117ee8275a9

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 aaec484a8e1ea246a91391f914f475cf84f808893234b6ab168e16e765e5d5b1
MD5 fa1a3aabf90d807583ea9804bbe063a0
BLAKE2b-256 74c82d7958a0a541792477aa7cc152047ed420114235cc209376498aa8b1eb69

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 1704e5982f11cdc113c6a8740e2e7ff9ad1d006bd0ccf9be8dd9da4c7c118ffb
MD5 4c70105826344275c891c8bfcc81701d
BLAKE2b-256 cca64fa0e687f36e9d7ba35a288b35a167c1ce62a1150e68c5557343a36f16a1

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b72df9aac69488e0f34ab9955c41fe2434c1b4aa0d25e8058cf34d3d8938c461
MD5 86c1ce81d13bce567781f79fa78add4a
BLAKE2b-256 2681936ecaa0ae930bef0688871df2076cea7ddeca71c2d1b384b0ffaa00ebf7

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.0-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.0-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 ce2ec1e392f611b38dc0aef70648ad66d24305112f003f3463cab14a81be5736
MD5 6572d4f1956948149683294826d3f2fc
BLAKE2b-256 438b06244cca30fc1a4097ffae333fd9ab5d523661bcaca890206537ca6b22b8

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d3c61890c5053f136848aee6866b78148fde5e899a14ced61d8dc493e0f66766
MD5 c58f6e39c2ffb7117f5249b6e08858ce
BLAKE2b-256 46833317f65b8cffdf12dcba56ac9f33b121483d4f9d28619810b235ddd01ec1

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4bef9b99c61d3bb228b75e20bb79508404730210dcae5eae4108c22b81f93b43
MD5 9f317ad090d198bc065f96bbfb338d58
BLAKE2b-256 ecb019d799f3ed1c30bc59c6a2f9560d15e5f2a202e66664865b16cb26271f24

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.0-cp39-cp39-macosx_10_14_universal2.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.0-cp39-cp39-macosx_10_14_universal2.whl
Algorithm Hash digest
SHA256 65501cdd6f154c24b1acdbe96edf8df91c4692e398d70a273a46271ee367f792
MD5 bbb71813062aedd06e67faf419913f46
BLAKE2b-256 e703a4300789a698b7dddd8b99f1a6b4962acc4e78050df8f09ede02fc7ca220

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.0-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.0-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cccc994e06085315d0a80946c1917c03a7a6781772570af615192e1224c778fd
MD5 15ff1052a1beb05ae021a95c9929411f
BLAKE2b-256 ccc756a00cb9da6b5fb2f94697135c03899b0b40f29f57c844ffd6dd7807e111

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.0-cp37-cp37m-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.0-cp37-cp37m-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f4e6cec3a4002eab3f216e29d1659f91b7eee6f63ee7215bcfc8e98ad8c8723e
MD5 e84b6a34a27342c926912be6dcc18534
BLAKE2b-256 d1bdd0b2de57daeeaa686a9d5d72c89b9d5a251e84880e6b2b04af6ac719a6f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page