Skip to main content

Torch-TensorRT is a package which allows users to automatically compile PyTorch and TorchScript modules to TensorRT while remaining in PyTorch

Project description

torch_tensorrt

Ahead of Time (AOT) compiling for PyTorch JIT

Torch-TensorRT is a compiler for PyTorch/TorchScript, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.

Example Usage

import torch_tensorrt

...

trt_ts_module = torch_tensorrt.compile(torch_script_module,
    inputs = [example_tensor, # Provide example tensor for input shape or...
        torch_tensorrt.Input( # Specify input object with shape and dtype
            min_shape=[1, 3, 224, 224],
            opt_shape=[1, 3, 512, 512],
            max_shape=[1, 3, 1024, 1024],
            # For static size shape=[1, 3, 224, 224]
            dtype=torch.half) # Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
    ],
    enabled_precisions = {torch.half}, # Run with FP16)

result = trt_ts_module(input_data) # run inference
torch.jit.save(trt_ts_module, "trt_torchscript_module.ts") # save the TRT embedded Torchscript

Installation

ABI / Platform Installation command
Pre CXX11 ABI (Linux x86_64) python3 setup.py install
CXX ABI (Linux x86_64) python3 setup.py install --use-cxx11-abi
Pre CXX11 ABI (Jetson platform aarch64) python3 setup.py install --jetpack-version 4.6
CXX11 ABI (Jetson platform aarch64) python3 setup.py install --jetpack-version 4.6 --use-cxx11-abi

For Linux x86_64 platform, Pytorch libraries default to pre cxx11 abi. So, please use python3 setup.py install.

On Jetson platforms, NVIDIA hosts pre-built Pytorch wheel files. These wheel files are built with CXX11 ABI. So on jetson platforms, please use python3 setup.py install --jetpack-version 4.6 --use-cxx11-abi

Under the Hood

When a traced module is provided to Torch-TensorRT, the compiler takes the internal representation and transforms it into one like this:

graph(%input.2 : Tensor):
    %2 : Float(84, 10) = prim::Constant[value=<Tensor>]()
    %3 : Float(120, 84) = prim::Constant[value=<Tensor>]()
    %4 : Float(576, 120) = prim::Constant[value=<Tensor>]()
    %5 : int = prim::Constant[value=-1]() # x.py:25:0
    %6 : int[] = prim::Constant[value=annotate(List[int], [])]()
    %7 : int[] = prim::Constant[value=[2, 2]]()
    %8 : int[] = prim::Constant[value=[0, 0]]()
    %9 : int[] = prim::Constant[value=[1, 1]]()
    %10 : bool = prim::Constant[value=1]() # ~/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py:346:0
    %11 : int = prim::Constant[value=1]() # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:539:0
    %12 : bool = prim::Constant[value=0]() # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:539:0
    %self.classifer.fc3.bias : Float(10) = prim::Constant[value= 0.0464  0.0383  0.0678  0.0932  0.1045 -0.0805 -0.0435 -0.0818  0.0208 -0.0358 [ CUDAFloatType{10} ]]()
    %self.classifer.fc2.bias : Float(84) = prim::Constant[value=<Tensor>]()
    %self.classifer.fc1.bias : Float(120) = prim::Constant[value=<Tensor>]()
    %self.feat.conv2.weight : Float(16, 6, 3, 3) = prim::Constant[value=<Tensor>]()
    %self.feat.conv2.bias : Float(16) = prim::Constant[value=<Tensor>]()
    %self.feat.conv1.weight : Float(6, 1, 3, 3) = prim::Constant[value=<Tensor>]()
    %self.feat.conv1.bias : Float(6) = prim::Constant[value= 0.0530 -0.1691  0.2802  0.1502  0.1056 -0.1549 [ CUDAFloatType{6} ]]()
    %input0.4 : Tensor = aten::_convolution(%input.2, %self.feat.conv1.weight, %self.feat.conv1.bias, %9, %8, %9, %12, %8, %11, %12, %12, %10) # ~/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py:346:0
    %input0.5 : Tensor = aten::relu(%input0.4) # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:1063:0
    %input1.2 : Tensor = aten::max_pool2d(%input0.5, %7, %6, %8, %9, %12) # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:539:0
    %input0.6 : Tensor = aten::_convolution(%input1.2, %self.feat.conv2.weight, %self.feat.conv2.bias, %9, %8, %9, %12, %8, %11, %12, %12, %10) # ~/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py:346:0
    %input2.1 : Tensor = aten::relu(%input0.6) # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:1063:0
    %x.1 : Tensor = aten::max_pool2d(%input2.1, %7, %6, %8, %9, %12) # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:539:0
    %input.1 : Tensor = aten::flatten(%x.1, %11, %5) # x.py:25:0
    %27 : Tensor = aten::matmul(%input.1, %4)
    %28 : Tensor = trt::const(%self.classifer.fc1.bias)
    %29 : Tensor = aten::add_(%28, %27, %11)
    %input0.2 : Tensor = aten::relu(%29) # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:1063:0
    %31 : Tensor = aten::matmul(%input0.2, %3)
    %32 : Tensor = trt::const(%self.classifer.fc2.bias)
    %33 : Tensor = aten::add_(%32, %31, %11)
    %input1.1 : Tensor = aten::relu(%33) # ~/.local/lib/python3.6/site-packages/torch/nn/functional.py:1063:0
    %35 : Tensor = aten::matmul(%input1.1, %2)
    %36 : Tensor = trt::const(%self.classifer.fc3.bias)
    %37 : Tensor = aten::add_(%36, %35, %11)
    return (%37)
(CompileGraph)

The graph has now been transformed from a collection of modules much like how your PyTorch Modules are collections of modules, each managing their own parameters into a single graph with the parameters inlined into the graph and all of the operations laid out. Torch-TensorRT has also executed a number of optimizations and mappings to make the graph easier to translate to TensorRT. From here the compiler can assemble the TensorRT engine by following the dataflow through the graph.

When the graph construction phase is complete, Torch-TensorRT produces a serialized TensorRT engine. From here depending on the API, this engine is returned to the user or moves into the graph construction phase. Here Torch-TensorRT creates a JIT Module to execute the TensorRT engine which will be instantiated and managed by the Torch-TensorRT runtime.

Here is the graph that you get back after compilation is complete:

graph(%self.1 : __torch__.___torch_mangle_10.LeNet_trt,
    %2 : Tensor):
    %1 : int = prim::Constant[value=94106001690080]()
    %3 : Tensor = trt::execute_engine(%1, %2)
    return (%3)
(AddEngineToGraph)

You can see the call where the engine is executed, based on a constant which is the ID of the engine, telling JIT how to find the engine and the input tensor which will be fed to TensorRT. The engine represents the exact same calculations as what is done by running a normal PyTorch module but optimized to run on your GPU.

Torch-TensorRT converts from TorchScript by generating layers or subgraphs in correspondance with instructions seen in the graph. Converters are small modules of code used to map one specific operation to a layer or subgraph in TensorRT. Not all operations are support, but if you need to implement one, you can in C++.

Registering Custom Converters

Operations are mapped to TensorRT through the use of modular converters, a function that takes a node from a the JIT graph and produces an equivalent layer or subgraph in TensorRT. Torch-TensorRT ships with a library of these converters stored in a registry, that will be executed depending on the node being parsed. For instance a aten::relu(%input0.4) instruction will trigger the relu converter to be run on it, producing an activation layer in the TensorRT graph. But since this library is not exhaustive you may need to write your own to get Torch-TensorRT to support your module.

Shipped with the Torch-TensorRT distribution are the internal core API headers. You can therefore access the converter registry and add a converter for the op you need.

For example, if we try to compile a graph with a build of Torch-TensorRT that doesn’t support the flatten operation (aten::flatten) you may see this error:

terminate called after throwing an instance of 'torch_tensorrt::Error'
what():  [enforce fail at core/conversion/conversion.cpp:109] Expected converter to be true but got false
Unable to convert node: %input.1 : Tensor = aten::flatten(%x.1, %11, %5) # x.py:25:0 (conversion.AddLayer)
Schema: aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
Converter for aten::flatten requested, but no such converter was found.
If you need a converter for this operator, you can try implementing one yourself
or request a converter: https://www.github.com/NVIDIA/Torch-TensorRT/issues

We can register a converter for this operator in our application. All of the tools required to build a converter can be imported by including Torch-TensorRT/core/conversion/converters/converters.h. We start by creating an instance of the self-registering class torch_tensorrt::core::conversion::converters::RegisterNodeConversionPatterns() which will register converters in the global converter registry, associating a function schema like aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor) with a lambda that will take the state of the conversion, the node/operation in question to convert and all of the inputs to the node and produces as a side effect a new layer in the TensorRT network. Arguments are passed as a vector of inspectable unions of TensorRT ITensors and Torch IValues in the order arguments are listed in the schema.

Below is a implementation of a aten::flatten converter that we can use in our application. You have full access to the Torch and TensorRT libraries in the converter implementation. So for example we can quickly get the output size by just running the operation in PyTorch instead of implementing the full calculation outself like we do below for this flatten converter.

#include "torch/script.h"
#include "torch_tensorrt/torch_tensorrt.h"
#include "torch_tensorrt/core/conversion/converters/converters.h"

static auto flatten_converter = torch_tensorrt::core::conversion::converters::RegisterNodeConversionPatterns()
    .pattern({
        "aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)",
        [](torch_tensorrt::core::conversion::ConversionCtx* ctx,
           const torch::jit::Node* n,
           torch_tensorrt::core::conversion::converters::args& args) -> bool {
            auto in = args[0].ITensor();
            auto start_dim = args[1].unwrapToInt();
            auto end_dim = args[2].unwrapToInt();
            auto in_shape = torch_tensorrt::core::util::toVec(in->getDimensions());
            auto out_shape = torch::flatten(torch::rand(in_shape), start_dim, end_dim).sizes();

            auto shuffle = ctx->net->addShuffle(*in);
            shuffle->setReshapeDimensions(torch_tensorrt::core::util::toDims(out_shape));
            shuffle->setName(torch_tensorrt::core::util::node_info(n).c_str());

            auto out_tensor = ctx->AssociateValueAndTensor(n->outputs()[0], shuffle->getOutput(0));
            return true;
        }
    });

To use this converter in Python, it is recommended to use PyTorch’s C++ / CUDA Extention template to wrap your library of converters into a .so that you can load with ctypes.CDLL() in your Python application.

You can find more information on all the details of writing converters in the contributors documentation (Writing Converters). If you find yourself with a large library of converter implementations, do consider upstreaming them, PRs are welcome and it would be great for the community to benefit as well.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torch_tensorrt-2.4.0-cp311-cp311-win_amd64.whl (2.8 MB view details)

Uploaded CPython 3.11 Windows x86-64

torch_tensorrt-2.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.34+ x86-64

torch_tensorrt-2.4.0-cp310-cp310-win_amd64.whl (2.8 MB view details)

Uploaded CPython 3.10 Windows x86-64

torch_tensorrt-2.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.34+ x86-64

torch_tensorrt-2.4.0-cp39-cp39-win_amd64.whl (2.8 MB view details)

Uploaded CPython 3.9 Windows x86-64

torch_tensorrt-2.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.34+ x86-64

torch_tensorrt-2.4.0-cp38-cp38-win_amd64.whl (2.8 MB view details)

Uploaded CPython 3.8 Windows x86-64

torch_tensorrt-2.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl (18.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.34+ x86-64

File details

Details for the file torch_tensorrt-2.4.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.4.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 9f24401f8c3021c3e44008fd3c2442494a3527671ffb8d1ab19d5059bf41f8c9
MD5 eef6b213f8f1baa533d2d31af0c3ca2f
BLAKE2b-256 99920b20a30e8ac95932e011b24048a92a7b7e7e583b3e63b34c5248a156674b

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 76c807ed23038c6776b50d43344354820db7345adb0da2899671f972e87bc453
MD5 514d97197abdead741fb44ac938df013
BLAKE2b-256 8eb55d6b92738ea3a1fda80495338e3840680541eae9007c1dfa8319f9c16d06

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.4.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.4.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 31dc1988d32d2c9b0a38f24a2bd1166e164d6fa6cfc1f48375af5b63a5c5c832
MD5 96dedeca1120b1f9b6db5605e6809b6f
BLAKE2b-256 13559d7db64a40221c8925864d5ddeccacfd7147cc9bfa1347478077436bd39e

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 97595fb8c51a642a53b390e718de9b02da033fc6415a71129ae759faa6dec4d8
MD5 6ffe5dbbbe1ee7e0e1b018c09cad6d11
BLAKE2b-256 9088e938b5dbc502293e1903f9c51e594d928b5a21d5ce11e2d6b5ac039ebe7e

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.4.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.4.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 78866e516e80cf066619cd87cddb45b0a981f59d10ef50d13007b73ac33a839f
MD5 eb83d6e071bc90cd8f52e6adf4d7a2e5
BLAKE2b-256 3c007227c2e7a2c0c444e3eeef7915835608e4a41e9577dedb9bb8e3e500b8ae

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 54e998355395a72c5aacc93062e77d46875d5648a9ff6e66b5113b895fffef90
MD5 e59dfc38594935c2685dc36b65239c44
BLAKE2b-256 342f7e90d5e7b16fc9182cfaa01df7ffb0b2030586319f862c37ce10afec4a3b

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.4.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.4.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 a8f96d0e4efba4d7726410663a6adec799045b818768e2a4f8d0b5e299a51555
MD5 1bbe7b417fb6417718431cd9e1fa6a70
BLAKE2b-256 d6c8312dcf0a4e4867fdb38d017688f1b393be23389a444dfa22f31cf8bd6ad8

See more details on using hashes here.

File details

Details for the file torch_tensorrt-2.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for torch_tensorrt-2.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 c758b51be167c4ec471e651631ee35c757b06aa0fffec47b36d7b5c349c2ba99
MD5 5d77ee766124efb1c41e56a0a5b22303
BLAKE2b-256 5a9008fa5993992f6681f2d86bc14faaf28b754660f002fed074f0e00c7a84ee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page