Skip to main content

No project description provided

Project description

NVTabular

PyPI LICENSE Documentation

NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS Dask-cuDF library.

NVTabular is a component of NVIDIA Merlin, an open source framework for building and deploying recommender systems and works with the other Merlin components including Merlin Models, HugeCTR and Merlin Systems to provide end-to-end acceleration of recommender systems on the GPU. Extending beyond model training, with NVIDIA’s Triton Inference Server, the feature engineering and preprocessing steps performed on the data during training can be automatically applied to incoming data during inference.

Benefits

When training DL recommender systems, data scientists and machine learning (ML) engineers have been faced with the following challenges:

  • Huge Datasets: Commercial recommenders are trained on huge datasets that may be several terabytes in scale.
  • Complex Data Feature Engineering and Preprocessing Pipelines: Datasets need to be preprocessed and transformed so that they can be used with DL models and frameworks. In addition, feature engineering creates an extensive set of new features from existing ones, requiring multiple iterations to arrive at an optimal solution.
  • Input Bottleneck: Data loading, if not well optimized, can be the slowest part of the training process, leading to under-utilization of high-throughput computing devices such as GPUs.
  • Extensive Repeated Experimentation: The entire data engineering, training, and evaluation process can be repetitious and time consuming, requiring significant computational resources.

NVTabular alleviates these challenges and helps data scientists and ML engineers:

  • process datasets that exceed GPU and CPU memory without having to worry about scale.
  • focus on what to do with the data and not how to do it by using abstraction at the operation level.
  • prepare datasets quickly and easily for experimentation so that more models can be trained.
  • deploy models into production by providing faster dataset transformation

Learn more in the NVTabular core features documentation.

Performance

When running NVTabular on the Criteo 1TB Click Logs Dataset using a single V100 32GB GPU, feature engineering and preprocessing was able to be completed in 13 minutes. Furthermore, when running NVTabular on a DGX-1 cluster with eight V100 GPUs, feature engineering and preprocessing was able to be completed within three minutes. Combined with HugeCTR, the dataset can be processed and a full model can be trained in only six minutes.

The performance of the Criteo DRLM workflow also demonstrates the effectiveness of the NVTabular library. The original ETL script provided in Numpy took over five days to complete. Combined with CPU training, the total iteration time is over one week. By optimizing the ETL code in Spark and running on a DGX-1 equivalent cluster, the time to complete feature engineering and preprocessing was reduced to three hours. Meanwhile, training was completed in one hour.

Installation

NVTabular requires Python version 3.7+. Additionally, GPU support requires:

  • CUDA version 11.0+
  • NVIDIA Pascal GPU or later (Compute Capability >=6.0)
  • NVIDIA driver 450.80.02+
  • Linux or WSL

Installing NVTabular Using Conda

NVTabular can be installed with Anaconda from the nvidia channel by running the following command:

conda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=11.2

Installing NVTabular Using Pip

NVTabular can be installed with pip by running the following command:

pip install nvtabular

Installing NVTabular with Pip causes NVTabular to run on the CPU only and might require installing additional dependencies manually. When you run NVTabular in one of our Docker containers, the dependencies are already installed.

Installing NVTabular with Docker

NVTabular Docker containers are available in the NVIDIA Merlin container repository. The following table summarizes the key information about the containers:

Container Name Container Location Functionality
merlin-hugectr https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr NVTabular, HugeCTR, and Triton Inference
merlin-tensorflow https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow NVTabular, Tensorflow and Triton Inference
merlin-pytorch https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-pytorch NVTabular, PyTorch, and Triton Inference

To use these Docker containers, you'll first need to install the NVIDIA Container Toolkit to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see Support Matrix.

Notebook Examples and Tutorials

We provide a collection of examples to demonstrate feature engineering with NVTabular as Jupyter notebooks:

  • Introduction to NVTabular's High-Level API
  • Advanced workflows with NVTabular
  • NVTabular on CPU
  • Scaling NVTabular to multi-GPU systems

In addition, NVTabular is used in many of our examples in other Merlin libraries:

Feedback and Support

If you'd like to contribute to the library directly, see the Contributing.md. We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this survey.

If you're interested in learning more about how NVTabular works, see our NVTabular documentation. We also have API documentation that outlines the specifics of the available calls within the library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nvtabular-1.8.1.tar.gz (126.1 kB view details)

Uploaded Source

Built Distributions

nvtabular-1.8.1-cp311-cp311-musllinux_1_1_x86_64.whl (804.1 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

nvtabular-1.8.1-cp311-cp311-musllinux_1_1_i686.whl (866.1 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

nvtabular-1.8.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (301.5 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

nvtabular-1.8.1-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (307.5 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686

nvtabular-1.8.1-cp310-cp310-musllinux_1_1_x86_64.whl (804.2 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

nvtabular-1.8.1-cp310-cp310-musllinux_1_1_i686.whl (866.1 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

nvtabular-1.8.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (301.6 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

nvtabular-1.8.1-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (307.4 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686

nvtabular-1.8.1-cp39-cp39-musllinux_1_1_x86_64.whl (804.4 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

nvtabular-1.8.1-cp39-cp39-musllinux_1_1_i686.whl (866.0 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

nvtabular-1.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (301.2 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

nvtabular-1.8.1-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (307.5 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

nvtabular-1.8.1-cp38-cp38-musllinux_1_1_x86_64.whl (804.2 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

nvtabular-1.8.1-cp38-cp38-musllinux_1_1_i686.whl (866.2 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

nvtabular-1.8.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (301.2 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

nvtabular-1.8.1-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (307.4 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

File details

Details for the file nvtabular-1.8.1.tar.gz.

File metadata

  • Download URL: nvtabular-1.8.1.tar.gz
  • Upload date:
  • Size: 126.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for nvtabular-1.8.1.tar.gz
Algorithm Hash digest
SHA256 0e7bc026390f63f7060722037cb286db8ae552fc853477a2f5da9c3f442d1d89
MD5 1c05e549ed42dea202e3f2389973be70
BLAKE2b-256 e0fc2579eaa7715274160e3e505ca7db06993f058bca5dff70c966b6c77f6383

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 d4d0093504f2f242a8664ef94fa97830542cc8ad55383dbb2019d83e8b626cbd
MD5 4520b4b23687ad25fada39fb1ec26ddc
BLAKE2b-256 0aab53afac20112ba5c0790845b29dfdce1b663f1b289a9ccc5b4a0a02928f1f

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp311-cp311-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp311-cp311-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 b74553b7627a25d949833226f9958c3428817116e626bfe9731668695158afc4
MD5 de193de800a4044f9817191153eb51d7
BLAKE2b-256 c62fe411e0c46cb2ee3db33bc0aa9714bcc86a758d1c5e73b9496aa84e996641

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f4f9cdbbeaddde241c619da4fdb88c16bce4608e7ad285fcfac76988bc920974
MD5 b349ee5140cee79fb78d008c669ba2b4
BLAKE2b-256 b2f4b7cbbd9e47e5f3a6b13a9a99109ade27059ad5e8272f67d6f7c8c193d1a6

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 71297bbb67d1d5cb5b08df693f3b63444758920f0930e9a5908858558dba711f
MD5 73fc156f42fa06340746c5e8c91ebd4a
BLAKE2b-256 80111a5dc28726ae70800533b7ce9f36669ca0fbe3544492949d50d7c8a6d847

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 dc23a690048d280808e174b5d5c6fe0b95a2624f1883c5dd7c3547e523164525
MD5 8dfc702c21d6ca4f4a9d79e49ddf6855
BLAKE2b-256 891ac907dcbf80b16eeaedd6b8710d28590f2267245c4acdd3f9dbbe5c295815

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 726bf77a217edece81e3b3cb7b3986bf54b653b68facace17ac39f22fa80c260
MD5 40fe68acd22d1b69d3e24f39e99892f5
BLAKE2b-256 cdc648872aa568f8c3c37c0760159dcf3b32a76a88058271cd22eeb440106d53

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 38e54cc317bdcf3f9c7fe833252fac6ce0d860e9c424fdeac5ea0852822b0b58
MD5 feb6d3c8e3cc6f4e7878a2963f9ee35b
BLAKE2b-256 2a1154d7cf11338a88910e841a62544e387bc23e74af7694dcee7ddda0847ab0

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 4a4ff42d06fb105cdc77823422c4a1a55e2b345c8fe8fa7738db3bcb128a5192
MD5 7d07716d8b6dc6ae91e49b4011143f34
BLAKE2b-256 71632a211da18190dd4f351f0121d8e665607b1ebe736af895ac1ada8df52c06

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 121dd044d4e060dca383e0a68d7e490811b6ac1658d6d513ff736ba37919e860
MD5 1382cfbea254061fce8e54389a53d4e4
BLAKE2b-256 91e9eebb36f65e4823270c6f188abec66fb76726ce3ac002f4bb30aa1fb3a4d8

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp39-cp39-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp39-cp39-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 f6d2f157c8266c22e37ca89f9dfb1c37c2eb88aabe0b6ea684c4bcbc58585486
MD5 30bfe4eeba329c62cfc13258c5f20d2c
BLAKE2b-256 ea16a4118932f8b08bd1a3459cdaabd1bfeba9510035c3b90b82790249710b56

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4a63207887adea97b9f26ed95885b434983cc861e35f74bc23d3126944672a50
MD5 d2f69a6ab282613fd6e3ef5539a7e83c
BLAKE2b-256 4ea53b4aa44c1951c16125dc17e6f63a0a32ab3fe5ee1641e551ea7e66e7e7dc

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 d43ea6202bb9e056340fd84ad03c8932221be868da77b4502bb78588dc659040
MD5 c6b6474462d13d872b8d96b7038da65b
BLAKE2b-256 5a30af5c74afb029c8c68b47c59914c5508a62ecde0119b375042488e1ce6629

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp38-cp38-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 29a2f8ef52b276f3e8543f90f78ffcdf5c4755e37d8537d99cd31ea08bfd3739
MD5 91ab7e5df8ea9e0386ba7754fcc0a7b9
BLAKE2b-256 c9e2158bcb23e1a21fe3f997c03bced3b852bdbaddd8029076e14848ed49471a

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp38-cp38-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp38-cp38-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 6594631cfc72ac5dad1a605eb2e1458e4279c01fd39f944468843356af81535d
MD5 29f42b5e10a6957380b77f2cc61d2f80
BLAKE2b-256 cc25e24503b846418f8d762800ac7f6161b5d726a0eaadf0eb0f921385641c84

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c58b4a8421e2a3ccf7baf1ddf07542bd851e0a34190dc8b1afedde7e556ce7dd
MD5 5126d040e12967cd85aed1676e951379
BLAKE2b-256 5d6c87b923892e691e90a8983c6b5a50f9d135670d3f316d30e379b265a5d9f1

See more details on using hashes here.

File details

Details for the file nvtabular-1.8.1-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-1.8.1-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 e547364c8824709cfa18045497628de591ce396d6ed94d20727567964bce78e9
MD5 bf97eddd91de0fcc7f1bdc2b6c626323
BLAKE2b-256 60c5ad21c5eea9fa33ef36e9ad17c9a87149ccb1ad5055418e6510a4e565f971

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page