NVIDIA cuTENSOR
Project description
cuTENSOR is a high-performance CUDA library for tensor primitives.
Key Features
Extensive mixed-precision support:
FP64 inputs with FP32 compute.
FP32 inputs with FP16, BF16, or TF32 compute.
Complex-times-real operations.
Conjugate (without transpose) support.
Support for up to 64-dimensional tensors.
Arbitrary data layouts.
Trivially serializable data structures.
Main computational routines:
Direct (i.e., transpose-free) tensor contractions.
Support just-in-time compilation of dedicated kernels.
Tensor reductions (including partial reductions).
Element-wise tensor operations:
Support for various activation functions.
Support for padding of the output tensor
Arbitrary tensor permutations.
Conversion between different data types.
Documentation
Please refer to https://docs.nvidia.com/cuda/cutensor/index.html for the cuTENSOR documentation.
Installation
The cuTENSOR wheel can be installed as follows:
pip install cutensor-cuXX
where XX is the CUDA major version (currently CUDA 11 & 12 are supported). The package cutensor (without the -cuXX suffix) is deprecated. If you have cutensor installed, please remove it prior to installing cutensor-cuXX.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cutensor_cu11-2.0.0-py3-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d82d66cac2b6e9264a38789e1dda978e5071c3da448b75427592be281ab957aa |
|
MD5 | 7933741ee61dbe1fe3f9a9fbaa04952e |
|
BLAKE2b-256 | 15c01cb0db3d17a1ff2c5a158dab9c481c75c7e6080dfb20f2a38de72c4dd403 |
Hashes for cutensor_cu11-2.0.0-py3-none-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d43c6a4cc74bd167913689b16b654ef0642b4ae39adb2ee11eb4fb46e1e3fb29 |
|
MD5 | 972d84b7bebb55c0516e1d33728730ed |
|
BLAKE2b-256 | da3b01e07fa45cc30012b560401883522960baa99c4f3f2ccdeba4a2bfa84980 |
Hashes for cutensor_cu11-2.0.0-py3-none-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8106b70be277759154b9a889342183b29c418bf4154c81b15b9750e99d838918 |
|
MD5 | 1bc2b5de2dcf4ff88bc2d5bb19f2c1ef |
|
BLAKE2b-256 | 14f943888d33378e045e5ee83a0860b9201fddd15523f362535b706af7a75d87 |