Skip to main content

MLPerf Inference LoadGen python bindings

Project description

Overview {#mainpage}

Introduction

  • The LoadGen is a reusable module that efficiently and fairly measures the performance of inference systems.
  • It generates traffic for scenarios as formulated by a diverse set of experts in the MLCommons working group.
  • The scenarios emulate the workloads seen in mobile devices, autonomous vehicles, robotics, and cloud-based setups.
  • Although the LoadGen is not model or dataset aware, its strength is in its reusability with logic that is.

Integration Example and Flow

The following is an diagram of how the LoadGen can be integrated into an inference system, resembling how some of the MLPerf reference models are implemented.

  1. Benchmark knows the model, dataset, and preprocessing.
  2. Benchmark hands dataset sample IDs to LoadGen.
  3. LoadGen starts generating queries of sample IDs.
  4. Benchmark creates requests to backend.
  5. Result is post processed and forwarded to LoadGen.
  6. LoadGen outputs logs for analysis.

Useful Links

Scope of the LoadGen's Responsibilities

In Scope

  • Provide a reusable C++ library with python bindings.
  • Implement the traffic patterns of the MLPerf Inference scenarios and modes.
  • Record all traffic generated and received for later analysis and verification.
  • Summarize the results and whether performance constraints were met.
  • Target high-performance systems with efficient multi-thread friendly logging utilities.
  • Generate trust via a shared, well-tested, and community-hardened code base.

Out of Scope

The LoadGen is:

  • NOT aware of the ML model it is running against.
  • NOT aware of the data formats of the model's inputs and outputs.
  • NOT aware of how to score the accuracy of a model's outputs.
  • NOT aware of MLPerf rules regarding scenario-specific constraints.

Limitting the scope of the LoadGen in this way keeps it reusable across different models and datasets without modification. Using composition and dependency injection, the user can define their own model, datasets, and metrics.

Additionally, not hardcoding MLPerf-specific test constraints, like test duration and performance targets, allows users to use the LoadGen unmodified for custom testing and continuous integration purposes.

Submission Considerations

Upstream all local modifications

  • As a rule, no local modifications to the LoadGen's C++ library are allowed for submission.
  • Please upstream early and often to keep the playing field level.

Choose your TestSettings carefully!

  • Since the LoadGen is oblivious to the model, it can't enforce the MLPerf requirements for submission. e.g.: target percentiles and latencies.
  • For verification, the values in TestSettings are logged.
  • To help make sure your settings are spec compliant, use TestSettings::FromConfig in conjunction with the relevant config file provided with the reference models.

Responsibilities of a LoadGen User

Implement the Interfaces

  • Implement the SystemUnderTest and QuerySampleLibrary interfaces and pass them to the StartTest function.
  • Call QuerySampleComplete for every sample received by SystemUnderTest::IssueQuery.

Assess Accuracy

  • Process the mlperf_log_accuracy.json output by the LoadGen to determine the accuracy of your system.
  • For the official models, Python scripts will be provided by the MLPerf model owners for you to do this automatically.

For templates of how to do the above in detail, refer to code for the demos, tests, and reference models.

LoadGen over the Network

For reference, on a high level a submission looks like this:

The LoadGen implementation is common to all submissions, while the QSL (“Query Sample Library”) and SUT (“System Under Test”) are implemented by submitters. QSL is responsible for loading the data and includes untimed preprocessing.

A submission over the network introduces a new component “QDL” (query dispatch library) that is added to the system as presented in the following diagram:

QDL is a proxy for a load-balancer, that dispatches queries to SUT over a physical network, receives the responses and passes them back to LoadGen. It is implemented by the submitter. The interface of the QDL is the same as the API to SUT.

In scenarios using QDL, data may be compressed in QSL at the choice of the submitter in order to reduce network transmission time. Decompression is part of the timed processing in SUT. A set of approved standard compression schemes will be specified for each benchmark; additional compression schemes must be approved in advance by the Working Group.

All communication between LoadGen/QSL and SUT is via QDL, and all communication between QDL and SUT must pass over a physical network.

QDL implements the protocol to transmit queries over the network and receive responses. It also implements decompression of any response returned by the SUT, where compression of responses is allowed. Performing any part of the timed preprocessing or inference in QDL is specifically disallowed. Currently no batching is allowed in QDL, although this may be revisited in future.

The MLperf over the Network will run in Server mode and Offline mode. All LoadGen modes are expected to work as is with insignificant changes. These include running the test in performance mode, accuracy mode, find peak performance mode and compliance mode. The same applies for power measurements.

QDL details

The Query Dispatch Library is implemented by the submitter and interfaces with LoadGen using the same SUT API. All MLPerf Inference SUTs implement the mlperf::SystemUnderTest class which is defined in system_under_test.h. The QDL implements mlperf::QueryDispatchLibrary class which inherits the mlperf::SystemUnderTest class and has the same API and support all existing mlperf::SystemUnderTest methods. It has a separate header file query_dispatch_library.h. Using sut with mlperf::SystemUnderTest class in LoadGen StartTest is natively upcasting mlperf::QueryDispatchLibrary class.

QDL Query issue and response over the network

The QDL gets the queries from the LoadGen through

void IssueQuery(const std::vector<QuerySample>& samples)

The QDL dispatches the queries to the SUT over the physical media. The exact method and implementation for it are submitter specific and would not be specified at MLCommons. Submitter implementation includes all methods required to serialize the query, load balance, drive it to the Operating system and network interface card and send to the SUT.

The QDL receives the query responses over the network from the SUT. The exact method and implementation for it are submitter specific and would not be specified at MLCommons. The submitter implementation includes all methods required to receive the network data from the Network Interface card, go through the Operating system, deserialize the query response, and provide it back to the LoadGen through query completion by:

struct QuerySampleResponse {
  ResponseId id;
  uintptr_t data;
  size_t size;
};
void QuerySamplesComplete(QuerySampleResponse* responses, 
                          size_t response_count);

QDL Additional Methods

In addition to that the QDL needs to implement the following methods that are provided by the SUT interface to the LoadGen:

const std::string& Name();

The Name function returns a known string for over the Network SUTs to identify it as over the network benchmark.

void FlushQueries();

It is not specified here how the QDL would query and configure the SUT to execute the above methods. The QDL responds to the LoadGen after receiving its own response from the SUT.

Example

Refer to LON demo for a reference example illustrating usage of Loadgen over the network.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

mlcommons_loadgen-4.1.4-cp313-cp313-musllinux_1_2_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.13 musllinux: musl 1.2+ x86-64

mlcommons_loadgen-4.1.4-cp313-cp313-musllinux_1_2_i686.whl (1.7 MB view details)

Uploaded CPython 3.13 musllinux: musl 1.2+ i686

mlcommons_loadgen-4.1.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (583.2 kB view details)

Uploaded CPython 3.13 manylinux: glibc 2.17+ x86-64

mlcommons_loadgen-4.1.4-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl (607.6 kB view details)

Uploaded CPython 3.13 manylinux: glibc 2.17+ i686

mlcommons_loadgen-4.1.4-cp312-cp312-musllinux_1_2_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.12 musllinux: musl 1.2+ x86-64

mlcommons_loadgen-4.1.4-cp312-cp312-musllinux_1_2_i686.whl (1.7 MB view details)

Uploaded CPython 3.12 musllinux: musl 1.2+ i686

mlcommons_loadgen-4.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (583.1 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

mlcommons_loadgen-4.1.4-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl (607.6 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ i686

File details

Details for the file mlcommons_loadgen-4.1.4-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.1.4-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 7a6209491242531a09c5743b9cfc1a88421e470e6e4d0325916c23cca7111d9c
MD5 0f935df179c9ca28db80831413cc0198
BLAKE2b-256 4569ea8ff8671b4f3e786e6313c6f26a2dd2cc8b053d98f1068689db71ac81f6

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.1.4-cp313-cp313-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.1.4-cp313-cp313-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 4b5d6a44373aae059369a860145b74fe00b65631ae696c54ac76d74000ec2374
MD5 46da2f9ed51409a2b453e1b3dd0eb36f
BLAKE2b-256 8b638343590cc25be80280348dd81685410163ed53b0d207d2fca84b01d05611

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.1.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.1.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 30eceb1a00ac605e27029ac4a15397e1a7370ef3ed040be835e081f7bd34a275
MD5 faa45d537718d704bb98970e9472d7bb
BLAKE2b-256 00403d5e0a38ca82c0862783fafe200305c7424e290fd1f4de621b6ab9a9db01

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.1.4-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.1.4-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 29fc93826c3a937db7d0ef06e1792ac58cbe85618844e84614e2f054b8521bc3
MD5 5c5d0fa59f21e735f61809009899a8f8
BLAKE2b-256 c7831abfd5f69935b32872663d223a414030da248859667bf124e87b48a49a2e

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.1.4-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.1.4-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 1eb33267e8405bb5d1cdb4c05cb8d91a302a1e6ceec197ff75206705016346a5
MD5 c5e7b21b2b21f258f98912c814ebb2e9
BLAKE2b-256 8c80b1a5826f441938d96633fe376346ad8cf7744f7fe2a167cb2aa83511ad87

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.1.4-cp312-cp312-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.1.4-cp312-cp312-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 85a205541e022459230357b3496c211f6ac451e01e212609d06cb4d2fbdd9899
MD5 8827c48a1f1916fa18d98c66bca177e4
BLAKE2b-256 2c964f1597885e6b47e0f9ebfe1fe0fcf143aacdc9adc6ebdedc26ab16bbe1a5

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d26d033de38049fe0dbed69896cc7f83ca636628c49e4b699e37f29b8a900bd3
MD5 fec20b0a7e4a8b5572ded733a5dd7c3a
BLAKE2b-256 e0c69c8fb0bbed8358230e94657c16be7754b2fb78333431b4387fe9518227ad

See more details on using hashes here.

File details

Details for the file mlcommons_loadgen-4.1.4-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for mlcommons_loadgen-4.1.4-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 acb549e49cc16de294fc328a70a73d3d97162a1878e4e11cf7ff1d415e01a29a
MD5 de2be36d82f2d1895fb9ff536fbd1df4
BLAKE2b-256 f4dd9d54c2d159d2b9da38fb1fa26407f8db9b46b31b767575970ab49767ae24

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page