Skip to main content

No project description provided

Project description

NVTabular

PyPI LICENSE Documentation

NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS Dask-cuDF library.

NVTabular is a component of NVIDIA Merlin, an open source framework for building and deploying recommender systems and works with the other Merlin components including Merlin Models, HugeCTR and Merlin Systems to provide end-to-end acceleration of recommender systems on the GPU. Extending beyond model training, with NVIDIA’s Triton Inference Server, the feature engineering and preprocessing steps performed on the data during training can be automatically applied to incoming data during inference.

Benefits

When training DL recommender systems, data scientists and machine learning (ML) engineers have been faced with the following challenges:

  • Huge Datasets: Commercial recommenders are trained on huge datasets that may be several terabytes in scale.
  • Complex Data Feature Engineering and Preprocessing Pipelines: Datasets need to be preprocessed and transformed so that they can be used with DL models and frameworks. In addition, feature engineering creates an extensive set of new features from existing ones, requiring multiple iterations to arrive at an optimal solution.
  • Input Bottleneck: Data loading, if not well optimized, can be the slowest part of the training process, leading to under-utilization of high-throughput computing devices such as GPUs.
  • Extensive Repeated Experimentation: The entire data engineering, training, and evaluation process can be repetitious and time consuming, requiring significant computational resources.

NVTabular alleviates these challenges and helps data scientists and ML engineers:

  • process datasets that exceed GPU and CPU memory without having to worry about scale.
  • focus on what to do with the data and not how to do it by using abstraction at the operation level.
  • prepare datasets quickly and easily for experimentation so that more models can be trained.
  • deploy models into production by providing faster dataset transformation

Learn more in the NVTabular core features documentation.

Performance

When running NVTabular on the Criteo 1TB Click Logs Dataset using a single V100 32GB GPU, feature engineering and preprocessing was able to be completed in 13 minutes. Furthermore, when running NVTabular on a DGX-1 cluster with eight V100 GPUs, feature engineering and preprocessing was able to be completed within three minutes. Combined with HugeCTR, the dataset can be processed and a full model can be trained in only six minutes.

The performance of the Criteo DRLM workflow also demonstrates the effectiveness of the NVTabular library. The original ETL script provided in Numpy took over five days to complete. Combined with CPU training, the total iteration time is over one week. By optimizing the ETL code in Spark and running on a DGX-1 equivalent cluster, the time to complete feature engineering and preprocessing was reduced to three hours. Meanwhile, training was completed in one hour.

Installation

NVTabular requires Python version 3.7+. Additionally, GPU support requires:

  • CUDA version 11.0+
  • NVIDIA Pascal GPU or later (Compute Capability >=6.0)
  • NVIDIA driver 450.80.02+
  • Linux or WSL

Installing NVTabular Using Conda

NVTabular can be installed with Anaconda from the nvidia channel by running the following command:

conda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=11.2

Installing NVTabular Using Pip

NVTabular can be installed with pip by running the following command:

pip install nvtabular

Installing NVTabular with Pip causes NVTabular to run on the CPU only and might require installing additional dependencies manually. When you run NVTabular in one of our Docker containers, the dependencies are already installed.

Installing NVTabular with Docker

NVTabular Docker containers are available in the NVIDIA Merlin container repository. The following table summarizes the key information about the containers:

Container Name Container Location Functionality
merlin-hugectr https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr NVTabular, HugeCTR, and Triton Inference
merlin-tensorflow https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow NVTabular, Tensorflow and Triton Inference
merlin-pytorch https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-pytorch NVTabular, PyTorch, and Triton Inference

To use these Docker containers, you'll first need to install the NVIDIA Container Toolkit to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see Support Matrix.

Notebook Examples and Tutorials

We provide a collection of examples to demonstrate feature engineering with NVTabular as Jupyter notebooks:

  • Introduction to NVTabular's High-Level API
  • Advanced workflows with NVTabular
  • NVTabular on CPU
  • Scaling NVTabular to multi-GPU systems

In addition, NVTabular is used in many of our examples in other Merlin libraries:

Feedback and Support

If you'd like to contribute to the library directly, see the Contributing.md. We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this survey.

If you're interested in learning more about how NVTabular works, see our NVTabular documentation. We also have API documentation that outlines the specifics of the available calls within the library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nvtabular-23.2.0.tar.gz (127.7 kB view details)

Uploaded Source

Built Distributions

nvtabular-23.2.0-cp311-cp311-musllinux_1_1_x86_64.whl (805.6 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

nvtabular-23.2.0-cp311-cp311-musllinux_1_1_i686.whl (867.6 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

nvtabular-23.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (303.0 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

nvtabular-23.2.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (309.0 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686

nvtabular-23.2.0-cp310-cp310-musllinux_1_1_x86_64.whl (805.7 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

nvtabular-23.2.0-cp310-cp310-musllinux_1_1_i686.whl (867.6 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

nvtabular-23.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (303.1 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

nvtabular-23.2.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (309.0 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686

nvtabular-23.2.0-cp39-cp39-musllinux_1_1_x86_64.whl (806.0 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

nvtabular-23.2.0-cp39-cp39-musllinux_1_1_i686.whl (867.5 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

nvtabular-23.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (302.7 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

nvtabular-23.2.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (309.1 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

nvtabular-23.2.0-cp38-cp38-musllinux_1_1_x86_64.whl (805.8 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

nvtabular-23.2.0-cp38-cp38-musllinux_1_1_i686.whl (867.7 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

nvtabular-23.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (302.8 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

nvtabular-23.2.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (308.9 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

File details

Details for the file nvtabular-23.2.0.tar.gz.

File metadata

  • Download URL: nvtabular-23.2.0.tar.gz
  • Upload date:
  • Size: 127.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for nvtabular-23.2.0.tar.gz
Algorithm Hash digest
SHA256 866c33b76a41fa5726e73111596a7ae26610ca26fee337aeb03281bb02b46d4c
MD5 c84117423d9fac5a4b99fb57897fcff7
BLAKE2b-256 a124a230b1257555878fbf17ddb6da20512b7d370575a303443f8d818e2df17c

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 4fc298f4d46da4a8f5f6c4ecb4763db52be02f3a07676381e2f77867dccdf8b3
MD5 ecdffdb0179bad8f19dce365c18469af
BLAKE2b-256 e9d843211ba310858c4af88487915be61da96deb7bc1b549b6495ac0ca8ed5b1

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp311-cp311-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp311-cp311-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 68f2ec38d3e899c027d735cf9292e660a593867c53d3b429037de7e676d86f25
MD5 12db572a12fba0f9fad6b7ee82a95506
BLAKE2b-256 82cdc5f19b5a6b1fb1b4f5ef6ea92e0ec4883241a87e077ae73ceb11a5435c06

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 612cf561981e43f2eef0607b842f05e87ee9e871e7eb809d7958d0248a22dbc7
MD5 e7631d35523ebc80e727c48861db9afd
BLAKE2b-256 1cd0bd8967f1bef7fda09981f631c8ef06376aa17828053903ce9238911eb746

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 156dc406a203ac65d38b593b5fdbbe400bdca812fc2dd8d6b578b752dca8101c
MD5 d791ec9d3b88b5e45f1993cd5605c439
BLAKE2b-256 9fe23eb877bf934e2a46b61858b55dccf1d44b13c5f3aff8cea63df18d4576a8

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 666b794038e13fbc87c43261675dad5072bc1a4d96ee172145d27ffe1933ef74
MD5 42ccbc7fa352ffb799bc2f9301727ffc
BLAKE2b-256 b4d6dee168e1b7c92b8687b786ba4959157a26cacb657cebf86800600256ac5a

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 833f0a2c1028ba23d2b4ac29375724a760dc96fdee7118718922aba72eb6a930
MD5 fae99bd5c92a07719add56e5d25eb466
BLAKE2b-256 fba6a9b555c50ff910d85310a14b3a656d86e865ae1a4a1d1f68f99469b144af

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5e9b1408b7e2b68836de3940bbf338bdccc2ef2836199ccbc285b47f175d6f07
MD5 bde4803475d8603645f62df8fda4dc16
BLAKE2b-256 9fb944f947885d3c34b9815cb0d73d0f14641c3889c78fb8d71919df7ad08614

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 86c1b8bdc47a4f33908ebbb51444ffcfb6ce6461bbdf1b0d8393465a72b656c5
MD5 ab3558d3e006ec51134a24ad21cc28aa
BLAKE2b-256 54575f98e69b1a82e077f283056e8f286a5b0faf21fc8d0898629bc77b42cb20

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 785aa27d55300a1451650c408c729a0c868590bbe99dda3b7b875617b655d755
MD5 f69df493b52e60138a5ca1f564354752
BLAKE2b-256 f8280f78aa1b78cbdbc2646c27ecc896e88266f5e720c3747b74f585bdfc4d23

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp39-cp39-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp39-cp39-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 85a6bd14f01cacaa8aed17eb3d4e679ce79e04f10019441cf44aaecbd93aefd3
MD5 ec73eb0ee82ebe30a7d379ab4bb973a0
BLAKE2b-256 aa698211b42718f80bccfe836dc32a69e140e8f17268b1d6fdc14d17f832e943

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b37e0d2e12b919d4fe9d856d4467fa5fd5ab1216c2313d339e58cc346befed35
MD5 e517f56b45df29ec4acb22218b6650a4
BLAKE2b-256 4d27cd39f05621f50644f69b5486b62e07bb8cc80514da8f71267ab4a902cf2b

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 11feb59501115db927e97f94711d9cf2a72225ec5225b4974e97b5ffda7af05c
MD5 1518c80ca8de6cc0d3d45e14be449b56
BLAKE2b-256 77e05ff1b80bd1809ed299ae66c06571d737f9f2647ecd30419a194fbe6306c1

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp38-cp38-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 a62034978ed548bb4c441548be68da039eccddee281e5f450a54b6de6cdfe784
MD5 cb610f4731f1095efae7eb56a47dc260
BLAKE2b-256 db1cfc5f71a65e61ce105b99a243062801c73c2588d6067e2975a544589027d7

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp38-cp38-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp38-cp38-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 26a7fd806458db84fdbbc41ebcd266a05797086a9bd40bce52f25305cacf8183
MD5 bce0123c8e274463e012f513a0b8ef5b
BLAKE2b-256 ffe583b42068482618d0a20d8d3054375b3aef441d831da8a1fb807284b3527f

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 68a42c5dd8b4498b7db743594e3d29df6806d54b92e1d7ae7519d2ccfa24eb7e
MD5 56b3da672da49323add603f7fd31df9d
BLAKE2b-256 77df3bbf1d0eb8508640eb0f7fd62bc9bb031c5778a9d79ead85dfc78df6a54d

See more details on using hashes here.

File details

Details for the file nvtabular-23.2.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.2.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 a6d0d11e6a09f66349909524d67a10428211d918bf593f5572e15ce3512e23df
MD5 47d9bb9362300e7fecc35c68d8b84564
BLAKE2b-256 12aadc7229b7ad99adace7a4320d139caf6b00c176542320fba26ce9d3a11a9e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page