Skip to main content

PyTriton - Flask/FastAPI-like interface to simplify Triton's deployment in Python environments.

Project description

PyTriton - a Flask/FastAPI-like framework designed to streamline the use of NVIDIA’s Triton Inference Server.

For comprehensive guidance on how to deploy your models, optimize performance, and explore the API, delve into the extensive resources found in our documentation.

Features at a Glance

The distinct capabilities of PyTriton are summarized in the feature matrix:

Feature

Description

Native Python support

You can create any Python function and expose it as an HTTP/gRPC API.

Framework-agnostic

You can run any Python code with any framework of your choice, such as: PyTorch, TensorFlow, or JAX.

Performance optimization

You can benefit from dynamic batching, response cache, model pipelining, clusters, performance tracing, and GPU/CPU inference.

Decorators

You can use batching decorators to handle batching and other pre-processing tasks for your inference function.

Easy installation and setup

You can use a simple and familiar interface based on Flask/FastAPI for easy installation and setup.

Model clients

You can access high-level model clients for HTTP/gRPC requests with configurable options and both synchronous and asynchronous API.

Streaming (alpha)

You can stream partial responses from a model by serving it in a decoupled mode.

Learn more about PyTriton’s architecture.

Prerequisites

Before proceeding with the installation of PyTriton, ensure your system meets the following criteria:

  • Operating System: Compatible with glibc version 2.35 or higher. - Primarily tested on Ubuntu 22.04. - Other supported OS include Debian 11+, Rocky Linux 9+, and Red Hat UBI 9+. - Use ldd --version to verify your glibc version.

  • Python: Version 3.8 or newer.

  • pip: Version 20.3 or newer.

  • libpython: Ensure libpython3.*.so is installed, corresponding to your Python version.

Install

The PyTriton can be installed from pypi.org by running the following command:

pip install nvidia-pytriton

Important: The Triton Inference Server binary is installed as part of the PyTriton package.

Discover more about PyTriton’s installation procedures, including Docker usage, prerequisites, and insights into building binaries from source to match your specific Triton server versions.

Quick Start

The quick start presents how to run Python model in Triton Inference Server without need to change the current working environment. In the example we are using a simple Linear model.

The infer_fn is a function that takes an data tensor and returns a list with single output tensor. The @batch from batching decorators is used to handle batching for the model.

import numpy as np
from pytriton.decorators import batch

@batch
def infer_fn(data):
    result = data * np.array([[-1]], dtype=np.float32)  # Process inputs and produce result
    return [result]

In the next step, you can create the binding between the inference callable and Triton Inference Server using the bind method from pyTriton. This method takes the model name, the inference callable, the inputs and outputs tensors, and an optional model configuration object.

from pytriton.model_config import Tensor
from pytriton.triton import Triton
triton = Triton()
triton.bind(
    model_name="Linear",
    infer_func=infer_fn,
    inputs=[Tensor(name="data", dtype=np.float32, shape=(-1,)),],
    outputs=[Tensor(name="result", dtype=np.float32, shape=(-1,)),],
)
triton.run()

Finally, you can send an inference query to the model using the ModelClient class. The infer_sample method takes the input data as a numpy array and returns the output data as a numpy array. You can learn more about the ModelClient class in the clients section.

from pytriton.client import ModelClient

client = ModelClient("localhost", "Linear")
data = np.array([1, 2, ], dtype=np.float32)
print(client.infer_sample(data=data))

After the inference is done, you can stop the Triton Inference Server and close the client:

client.close()
triton.stop()

The output of the inference should be:

{'result': array([-1., -2.], dtype=float32)}

For the full example, including defining the model and binding it to the Triton server, check out our detailed Quick Start instructions. Get your model up and running, explore how to serve it, and learn how to invoke it from client applications.

The full example code can be found in examples/linear_random_pytorch.

Examples

The examples page showcases various use cases of serving models using PyTriton. This includes simple examples of running models in PyTorch, TensorFlow2, JAX, and plain Python. In addition, more advanced scenarios are covered, such as online learning, multi-node models, and deployment on Kubernetes using PyTriton. Each example is accompanied by instructions on how to build and run it. Discover more about utilizing PyTriton by exploring our examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

nvidia_pytriton-0.5.9-py3-none-manylinux_2_35_x86_64.whl (40.5 MB view details)

Uploaded Python 3 manylinux: glibc 2.35+ x86-64

nvidia_pytriton-0.5.9-py3-none-manylinux_2_35_aarch64.whl (39.2 MB view details)

Uploaded Python 3 manylinux: glibc 2.35+ ARM64

File details

Details for the file nvidia_pytriton-0.5.9-py3-none-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_pytriton-0.5.9-py3-none-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 3c51133d428dc0b8eebe69c40c5de1d702d98f05ffb04b591d13e227bdb0f0a5
MD5 7c32fe9553996ee2a94ad3d4879fd31a
BLAKE2b-256 b34b21982fbf0ff4f00dd09985c94e8df69c0be19d8ab4b8980a8aece3f052bc

See more details on using hashes here.

File details

Details for the file nvidia_pytriton-0.5.9-py3-none-manylinux_2_35_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_pytriton-0.5.9-py3-none-manylinux_2_35_aarch64.whl
Algorithm Hash digest
SHA256 58869a251fd10c19eb4db7bb995a8d70b8d80369999254ff2cf7d115772a1936
MD5 856dd1684f4db7c17442a13c127cb937
BLAKE2b-256 01e3254e80d4a4eefae05a77b8ab1da8de37954a737e3e76aab4a1772bdd3cc8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page