Torch-TensorRT is a package which allows users to automatically compile PyTorch and TorchScript modules to TensorRT while remaining in PyTorch
Project description
Torch-TensorRT
Easily achieve the best inference performance for any PyTorch model on the NVIDIA platform.
Torch-TensorRT brings the power of TensorRT to PyTorch. Accelerate inference latency by up to 5x compared to eager execution in just one line of code.
Installation
Stable versions of Torch-TensorRT are published on PyPI
pip install torch-tensorrt
Nightly versions of Torch-TensorRT are published on the PyTorch package index
pip install --pre torch-tensorrt --index-url https://download.pytorch.org/whl/nightly/cu124
Torch-TensorRT is also distributed in the ready-to-run NVIDIA NGC PyTorch Container which has all dependencies with the proper versions and example notebooks included.
For more advanced installation methods, please see here
Quickstart
Option 1: torch.compile
You can use Torch-TensorRT anywhere you use torch.compile
:
import torch
import torch_tensorrt
model = MyModel().eval().cuda() # define your model here
x = torch.randn((1, 3, 224, 224)).cuda() # define what the inputs to the model will look like
optimized_model = torch.compile(model, backend="tensorrt")
optimized_model(x) # compiled on first run
optimized_model(x) # this will be fast!
Option 2: Export
If you want to optimize your model ahead-of-time and/or deploy in a C++ environment, Torch-TensorRT provides an export-style workflow that serializes an optimized module. This module can be deployed in PyTorch or with libtorch (i.e. without a Python dependency).
Step 1: Optimize + serialize
import torch
import torch_tensorrt
model = MyModel().eval().cuda() # define your model here
inputs = [torch.randn((1, 3, 224, 224)).cuda()] # define a list of representative inputs here
trt_gm = torch_tensorrt.compile(model, ir="dynamo", inputs)
torch_tensorrt.save(trt_gm, "trt.ep", inputs=inputs) # PyTorch only supports Python runtime for an ExportedProgram. For C++ deployment, use a TorchScript file
torch_tensorrt.save(trt_gm, "trt.ts", output_format="torchscript", inputs=inputs)
Step 2: Deploy
Deployment in PyTorch:
import torch
import torch_tensorrt
inputs = [torch.randn((1, 3, 224, 224)).cuda()] # your inputs go here
# You can run this in a new python session!
model = torch.export.load("trt.ep").module()
# model = torch_tensorrt.load("trt.ep").module() # this also works
model(*inputs)
Deployment in C++:
#include "torch/script.h"
#include "torch_tensorrt/torch_tensorrt.h"
auto trt_mod = torch::jit::load("trt.ts");
auto input_tensor = [...]; // fill this with your inputs
auto results = trt_mod.forward({input_tensor});
Further resources
- Up to 50% faster Stable Diffusion inference with one line of code
- Optimize LLMs from Hugging Face with Torch-TensorRT [coming soon]
- Run your model in FP8 with Torch-TensorRT
- Tools to resolve graph breaks and boost performance [coming soon]
- Tech Talk (GTC '23)
- Documentation
Platform Support
Platform | Support |
---|---|
Linux AMD64 / GPU | Supported |
Windows / GPU | Supported (Dynamo only) |
Linux aarch64 / GPU | Native Compilation Supported on JetPack-4.4+ (use v1.0.0 for the time being) |
Linux aarch64 / DLA | Native Compilation Supported on JetPack-4.4+ (use v1.0.0 for the time being) |
Linux ppc64le / GPU | Not supported |
Note: Refer NVIDIA L4T PyTorch NGC container for PyTorch libraries on JetPack.
Dependencies
These are the following dependencies used to verify the testcases. Torch-TensorRT can work with other versions, but the tests are not guaranteed to pass.
- Bazel 6.3.2
- Libtorch 2.5.0.dev (latest nightly) (built with CUDA 12.4)
- CUDA 12.4
- TensorRT 10.3.0.26
Deprecation Policy
Deprecation is used to inform developers that some APIs and tools are no longer recommended for use. Beginning with version 2.3, Torch-TensorRT has the following deprecation policy:
Deprecation notices are communicated in the Release Notes. Deprecated API functions will have a statement in the source documenting when they were deprecated. Deprecated methods and classes will issue deprecation warnings at runtime, if they are used. Torch-TensorRT provides a 6-month migration period after the deprecation. APIs and tools continue to work during the migration period. After the migration period ends, APIs and tools are removed in a manner consistent with semantic versioning.
Contributing
Take a look at the CONTRIBUTING.md
License
The Torch-TensorRT license can be found in the LICENSE file. It is licensed with a BSD Style licence
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file torch_tensorrt-2.5.0-cp312-cp312-win_amd64.whl
.
File metadata
- Download URL: torch_tensorrt-2.5.0-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9e7ee1510e644106e9fb6f194a227a1394d8646fea049e5dba572f99dc38b24 |
|
MD5 | 923ef5334e0be429d72f836b2e25adb5 |
|
BLAKE2b-256 | e070ed10a8ce0f30bc938a53fcb0066fe2fb34dac33c6f240c24d3d9276444c1 |
File details
Details for the file torch_tensorrt-2.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
.
File metadata
- Download URL: torch_tensorrt-2.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
- Upload date:
- Size: 3.6 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d317a244284f5fe2f57b78ec895ff6da0b23ff7b208a5f3001a61a77f3b9ec35 |
|
MD5 | 027b179949a0bfb2afdaa8d76c4340ee |
|
BLAKE2b-256 | 3b63f2d800a2353c1b03d6000ff686f30dae4fa746fdf8dd85b6d4245e5593ff |
File details
Details for the file torch_tensorrt-2.5.0-cp311-cp311-win_amd64.whl
.
File metadata
- Download URL: torch_tensorrt-2.5.0-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a893b933d6432c8a07ce0bf9b394c5b200ba3d620250bef4d89b0a734189cd84 |
|
MD5 | ccfdaab726a319eebc18a056c3855a87 |
|
BLAKE2b-256 | 10762345bee0199424846d5708254ed1c71293f4825b15c6b824d7fae32aac55 |
File details
Details for the file torch_tensorrt-2.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
.
File metadata
- Download URL: torch_tensorrt-2.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
- Upload date:
- Size: 3.6 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b408fe06ba0e855cb6a142a2ac27815ab45f43c05d3e3880c8602429cddb27f |
|
MD5 | 6408b1f067b61f75ad9f9f599a9420c1 |
|
BLAKE2b-256 | 87c611f4e6300bd135dc7f8160670f6b2b10618e5112c6ee368dfcc4f7dc5cfc |
File details
Details for the file torch_tensorrt-2.5.0-cp310-cp310-win_amd64.whl
.
File metadata
- Download URL: torch_tensorrt-2.5.0-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fca7984736394cfc4685460cdde5a3b0e697bd23a7aeba91e66daa0619c91170 |
|
MD5 | 926c43f170e9afcbd4237c1b7fcee45a |
|
BLAKE2b-256 | 5f2b4662215a1b7ac311dea83ebd2dd7accb72618231d4600ea172e370c1766d |
File details
Details for the file torch_tensorrt-2.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
.
File metadata
- Download URL: torch_tensorrt-2.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
- Upload date:
- Size: 3.6 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b059b1e024e1ae0f37ab2da32f15077e53a18294368f74c8d92ebc24fb1c5f3 |
|
MD5 | c38228b27343f568bf4318b266461bca |
|
BLAKE2b-256 | 13490bd42291b2bd6bdc64e51299e731be09f3e8b8bbee939d943240c7b63dca |
File details
Details for the file torch_tensorrt-2.5.0-cp39-cp39-win_amd64.whl
.
File metadata
- Download URL: torch_tensorrt-2.5.0-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0a2ed43fb047ed7294c3213c52367676d6581483f6fae045d744ce2549d7781 |
|
MD5 | 02d67545bf99f2b995369eaad6b0703b |
|
BLAKE2b-256 | 4eea2dba63cf0f929abe83ea915f34ba973a9029fe9b26015e50b17670c79b0a |
File details
Details for the file torch_tensorrt-2.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
.
File metadata
- Download URL: torch_tensorrt-2.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
- Upload date:
- Size: 3.6 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6076cac847127bfea3cce3bb50aa7a7465510c0bf3eb085bd49dbfde1998471a |
|
MD5 | 7d2fdc7fc385ae5ebb8ec5838dde75ad |
|
BLAKE2b-256 | 0ac3dc2c0580d4ee49714e1dca7199dba065168759d0e375838d9f31b4cc5855 |