Skip to main content

Merlin Dataloader

Project description

Merlin Dataloader

PyPI - Python Version PyPI version shields.io GitHub License Documentation

The merlin-dataloader lets you quickly train recommender models for TensorFlow and PyTorch. It eliminates the biggest bottleneck in training recommender models, by providing GPU optimized dataloaders that read data directly into the GPU, and then do a 0-copy transfer to TensorFlow and PyTorch using dlpack.

The benefits of the Merlin Dataloader include:

  • Over 10x speedup over native framework dataloaders
  • Handles larger than memory datasets
  • Per-epoch shuffling
  • Distributed training

Installation

Merlin-dataloader requires Python version 3.7+. Additionally, GPU support requires CUDA 11.0+.

To install using Conda:

conda install -c nvidia -c rapidsai -c numba -c conda-forge merlin-loader python=3.7 cudatoolkit=11.2

To install from PyPi:

pip install merlin-dataloader

There are also docker containers on NGC with the merlin-dataloader and dependencies included on them

Basic Usage

# Get a merlin dataset from a set of parquet files
import merlin.io
dataset = merlin.io.Dataset(PARQUET_FILE_PATHS, engine="parquet")

# Create a Tensorflow dataloader from the dataset, loading 65K items
# per batch
from merlin.loader.tensorflow import Loader
loader = Loader(dataset, batch_size=65536)

# Get a single batch of data. Inputs will be a dictionary of columnname
# to TensorFlow tensors
inputs, target = next(loader)

# Train a Keras model with the dataloader
model = tf.keras.Model( ... )
model.fit(loader, epochs=5)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merlin-dataloader-0.0.2.tar.gz (44.1 kB view details)

Uploaded Source

File details

Details for the file merlin-dataloader-0.0.2.tar.gz.

File metadata

  • Download URL: merlin-dataloader-0.0.2.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for merlin-dataloader-0.0.2.tar.gz
Algorithm Hash digest
SHA256 f0a228adeb431222091e00e80b33b2c6f50366a0dbc6012b266a8b16f1bca7ae
MD5 cf3dadb7dcadd424fd8f69845ab8daf7
BLAKE2b-256 7902e9a05a8f5019c1759e64ab7f8d2b84bc9b9772cbe49f07a87ee9b67d7b77

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page