NVIDIA cuTENSOR
Project description
cuTENSOR is a high-performance CUDA library for tensor primitives.
Key Features
Extensive mixed-precision support:
FP64 inputs with FP32 compute.
FP32 inputs with FP16, BF16, or TF32 compute.
Complex-times-real operations.
Conjugate (without transpose) support.
Support for up to 64-dimensional tensors.
Arbitrary data layouts.
Trivially serializable data structures.
Main computational routines:
Direct (i.e., transpose-free) tensor contractions.
Tensor reductions (including partial reductions).
Element-wise tensor operations:
Support for various activation functions.
Arbitrary tensor permutations.
Conversion between different data types.
Documentation
Please refer to https://docs.nvidia.com/cuda/cutensor/index.html for the cuTENSOR documentation.
Installation
The cuTENSOR wheel can be installed as follows:
pip install cutensor-cuXX
where XX is the CUDA major version (currently CUDA 11 & 12 are supported). The package cutensor (without the -cuXX suffix) is considered deprecated. If you have cutensor installed, please remove it prior to installing cutensor-cuXX.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cutensor_cu11-1.6.2-py3-none-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 383ce78ec772506c285ea24ccd235ea567dc355c194711458ace850fa0107417 |
|
MD5 | 4d76a6558500f0a3f3091cdb609e3d9a |
|
BLAKE2b-256 | f701902299bad713fbeb7e328be91532902fc6e245fc3d4d69fb712e98a760d4 |
Hashes for cutensor_cu11-1.6.2-py3-none-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1fdaac0782c845475bbb64a2e6e9b163f8b9fe31eae41077715603f449f200c |
|
MD5 | 08cdb130ff2e97d4364768d140397755 |
|
BLAKE2b-256 | e1da04a6fd9c0964248c159ef035c04c1c4e39b3da71c11bc5c4dd796b0fb172 |