NVIDIA cuTENSOR
Project description
cuTENSOR is a high-performance CUDA library for tensor primitives.
Key Features
Extensive mixed-precision support:
FP64 inputs with FP32 compute.
FP32 inputs with FP16, BF16, or TF32 compute.
Complex-times-real operations.
Conjugate (without transpose) support.
Support for up to 64-dimensional tensors.
Arbitrary data layouts.
Trivially serializable data structures.
Main computational routines:
Direct (i.e., transpose-free) tensor contractions.
Support just-in-time compilation of dedicated kernels.
Tensor reductions (including partial reductions).
Element-wise tensor operations:
Support for various activation functions.
Support for padding of the output tensor
Arbitrary tensor permutations.
Conversion between different data types.
Documentation
Please refer to https://docs.nvidia.com/cuda/cutensor/index.html for the cuTENSOR documentation.
Installation
The cuTENSOR wheel can be installed as follows:
pip install cutensor-cuXX
where XX is the CUDA major version (currently CUDA 11 & 12 are supported). The package cutensor (without the -cuXX suffix) is deprecated. If you have cutensor installed, please remove it prior to installing cutensor-cuXX.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cutensor_cu11-2.0.2-py3-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4576723d94b81bdc733e1cdb30808551ed1ddeb7d0440df58f56b2555d639f02 |
|
MD5 | 97fdf0934f6eb00e93bfd34937040a15 |
|
BLAKE2b-256 | f0b928977c56495b4847b98fb0348c70fff0fd74c3e7535407c6eb0cbae28a26 |
Hashes for cutensor_cu11-2.0.2-py3-none-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d37a1164cb02d74322b35b09f018ce51aff078dedee10823820b9d878ebb8c3 |
|
MD5 | 594dd2e6bb48303b91df94281603a172 |
|
BLAKE2b-256 | 3d005eb39fbd12ecfe727f15749337ecda5585977ae9d969c2f7c69a12f55649 |
Hashes for cutensor_cu11-2.0.2-py3-none-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e17003e5f5cf0e83292e9e7e380b64c87a311f8096b3a287a630cbab743ef52f |
|
MD5 | f063f5299c4da6ccba3fa60aa8f4a2fa |
|
BLAKE2b-256 | 3193d8ee8ac22b83e004c6d1f8e16a6f50834ffb300f4db032dac257e77e8ba8 |