Skip to main content

PyTorch extension for handling deeply nested sequences of variable length

Project description


FoldedTensor: PyTorch extension for handling deeply nested sequences of variable length

foldedtensor is a PyTorch extension that provides efficient handling of tensors containing deeply nested sequences variable sizes. It enables the flattening/unflattening (or unfolding/folding) of data dimensions based on a inner structure of sequence lengths. This library is particularly useful when working with data that can be split in different ways and enables you to avoid choosing a fixed representation.

Installation

The library can be installed with pip:

pip install foldedtensor

Features

  • Support for arbitrary numbers of nested dimensions
  • No computational overhead when dealing with already padded tensors
  • Dynamic re-padding (or refolding) of data based on stored inner lengths
  • Automatic mask generation and updating whenever the tensor is refolded
  • C++ optimized code for fast data loading from Python lists and refolding
  • Flexibility in data representation, making it easy to switch between different layouts when needed

Example

import torch
from foldedtensor import as_folded_tensor

# Creating a folded tensor from a nested list
# There are 2 samples, the first with 5 lines, the second with 1 line.
# Each line contain between 1 and 2 words.
ft = as_folded_tensor(
    [
        [[1], [], [], [], [2, 3]],
        [[4, 3]],
    ],
    data_dims=("samples", "words"),
    full_names=("samples", "lines", "words"),
    dtype=torch.long,
)
print(ft)
# FoldedTensor([[1, 2, 3],
#               [4, 3, 0]])

# Refold on the lines and words dims (flatten the samples dim)
print(ft.refold(("lines", "words")))
# FoldedTensor([[1, 0],
#               [0, 0],
#               [0, 0],
#               [0, 0],
#               [2, 3],
#               [4, 3]])

# Refold on the words dim only: flatten everything
print(ft.refold(("words",)))
# FoldedTensor([1, 2, 3, 4, 3])

# Working with PyTorch operations
embedder = torch.nn.Embedding(10, 16)
embedding = embedder(ft.refold(("words",)))
print(embedding.shape)
# torch.Size([5, 16]) # 5 words total, 16 dims

refolded_embedding = embedding.refold(("samples", "words"))
print(refolded_embedding.shape)
# torch.Size([2, 5, 16]) # 2 samples, 5 words max, 16 dims

Comparison with alternatives

Unlike other ragged or nested tensor implementations, a FoldedTensor does not enforce a specific structure on the nested data, and does not require padding all dimensions. This provides the user with greater flexibility when working with data that can be arranged in multiple ways depending on the data transformation. Moreover, the C++ optimization ensures high performance, making it ideal for handling deeply nested tensors efficiently.

Here is a comparison with other common implementations for handling nested sequences of variable length:

Feature NestedTensor MaskedTensor FoldedTensor
Inner data structure Flat Padded Arbitrary
Max nesting level 1 1
From nested python lists No No Yes
Layout conversion To padded No Any
Reduction ops w/o padding Yes No No

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

foldedtensor-0.2.1.post0.tar.gz (10.5 kB view details)

Uploaded Source

File details

Details for the file foldedtensor-0.2.1.post0.tar.gz.

File metadata

  • Download URL: foldedtensor-0.2.1.post0.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for foldedtensor-0.2.1.post0.tar.gz
Algorithm Hash digest
SHA256 2c60e4c716c5a698e7885ce0ec47a251606df3ea301d565846615b29d7020b1c
MD5 889c8adea955c636a317724c87238e54
BLAKE2b-256 f1b973f6e1b9d44fd0ecf01e00aa3d35747eeeb4ef8868132047642d385f1273

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page