Skip to main content

No project description provided

Project description

NVTabular

PyPI LICENSE Documentation

NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS Dask-cuDF library.

NVTabular is a component of NVIDIA Merlin, an open source framework for building and deploying recommender systems and works with the other Merlin components including Merlin Models, HugeCTR and Merlin Systems to provide end-to-end acceleration of recommender systems on the GPU. Extending beyond model training, with NVIDIA’s Triton Inference Server, the feature engineering and preprocessing steps performed on the data during training can be automatically applied to incoming data during inference.

Benefits

When training DL recommender systems, data scientists and machine learning (ML) engineers have been faced with the following challenges:

  • Huge Datasets: Commercial recommenders are trained on huge datasets that may be several terabytes in scale.
  • Complex Data Feature Engineering and Preprocessing Pipelines: Datasets need to be preprocessed and transformed so that they can be used with DL models and frameworks. In addition, feature engineering creates an extensive set of new features from existing ones, requiring multiple iterations to arrive at an optimal solution.
  • Input Bottleneck: Data loading, if not well optimized, can be the slowest part of the training process, leading to under-utilization of high-throughput computing devices such as GPUs.
  • Extensive Repeated Experimentation: The entire data engineering, training, and evaluation process can be repetitious and time consuming, requiring significant computational resources.

NVTabular alleviates these challenges and helps data scientists and ML engineers:

  • process datasets that exceed GPU and CPU memory without having to worry about scale.
  • focus on what to do with the data and not how to do it by using abstraction at the operation level.
  • prepare datasets quickly and easily for experimentation so that more models can be trained.
  • deploy models into production by providing faster dataset transformation

Learn more in the NVTabular core features documentation.

Performance

When running NVTabular on the Criteo 1TB Click Logs Dataset using a single V100 32GB GPU, feature engineering and preprocessing was able to be completed in 13 minutes. Furthermore, when running NVTabular on a DGX-1 cluster with eight V100 GPUs, feature engineering and preprocessing was able to be completed within three minutes. Combined with HugeCTR, the dataset can be processed and a full model can be trained in only six minutes.

The performance of the Criteo DRLM workflow also demonstrates the effectiveness of the NVTabular library. The original ETL script provided in Numpy took over five days to complete. Combined with CPU training, the total iteration time is over one week. By optimizing the ETL code in Spark and running on a DGX-1 equivalent cluster, the time to complete feature engineering and preprocessing was reduced to three hours. Meanwhile, training was completed in one hour.

Installation

NVTabular requires Python version 3.7+. Additionally, GPU support requires:

  • CUDA version 11.0+
  • NVIDIA Pascal GPU or later (Compute Capability >=6.0)
  • NVIDIA driver 450.80.02+
  • Linux or WSL

Installing NVTabular Using Conda

NVTabular can be installed with Anaconda from the nvidia channel by running the following command:

conda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=11.2

Installing NVTabular Using Pip

NVTabular can be installed with pip by running the following command:

pip install nvtabular

Installing NVTabular with Pip causes NVTabular to run on the CPU only and might require installing additional dependencies manually. When you run NVTabular in one of our Docker containers, the dependencies are already installed.

Installing NVTabular with Docker

NVTabular Docker containers are available in the NVIDIA Merlin container repository. The following table summarizes the key information about the containers:

Container Name Container Location Functionality
merlin-hugectr https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr NVTabular, HugeCTR, and Triton Inference
merlin-tensorflow https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow NVTabular, Tensorflow and Triton Inference
merlin-pytorch https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-pytorch NVTabular, PyTorch, and Triton Inference

To use these Docker containers, you'll first need to install the NVIDIA Container Toolkit to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see Support Matrix.

Notebook Examples and Tutorials

We provide a collection of examples to demonstrate feature engineering with NVTabular as Jupyter notebooks:

  • Introduction to NVTabular's High-Level API
  • Advanced workflows with NVTabular
  • NVTabular on CPU
  • Scaling NVTabular to multi-GPU systems

In addition, NVTabular is used in many of our examples in other Merlin libraries:

Feedback and Support

If you'd like to contribute to the library directly, see the Contributing.md. We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this survey.

If you're interested in learning more about how NVTabular works, see our NVTabular documentation. We also have API documentation that outlines the specifics of the available calls within the library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nvtabular-23.6.0.tar.gz (114.3 kB view details)

Uploaded Source

Built Distributions

nvtabular-23.6.0-cp311-cp311-musllinux_1_1_x86_64.whl (788.3 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

nvtabular-23.6.0-cp311-cp311-musllinux_1_1_i686.whl (849.4 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

nvtabular-23.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282.1 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

nvtabular-23.6.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (290.1 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686

nvtabular-23.6.0-cp310-cp310-musllinux_1_1_x86_64.whl (788.3 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

nvtabular-23.6.0-cp310-cp310-musllinux_1_1_i686.whl (849.5 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

nvtabular-23.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282.2 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

nvtabular-23.6.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (290.3 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686

nvtabular-23.6.0-cp39-cp39-musllinux_1_1_x86_64.whl (789.0 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

nvtabular-23.6.0-cp39-cp39-musllinux_1_1_i686.whl (850.0 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

nvtabular-23.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282.4 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

nvtabular-23.6.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (291.5 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

nvtabular-23.6.0-cp38-cp38-musllinux_1_1_x86_64.whl (787.8 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

nvtabular-23.6.0-cp38-cp38-musllinux_1_1_i686.whl (849.4 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

nvtabular-23.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282.1 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

nvtabular-23.6.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (290.6 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

File details

Details for the file nvtabular-23.6.0.tar.gz.

File metadata

  • Download URL: nvtabular-23.6.0.tar.gz
  • Upload date:
  • Size: 114.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for nvtabular-23.6.0.tar.gz
Algorithm Hash digest
SHA256 22aef018ef70360f0cdc40d735a21f0d8d62ef280639a814e8b62326f2c7042f
MD5 e88d8185af2b9b882882e8d519edfdcc
BLAKE2b-256 65f7ca238b792b0138a2dcdb745d6fd16f4fd17bec2fab0bafb953ac73b5cc24

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 bf036aea4f576943f8138a074468e56d5554effabe357100cdb987b07baef4a9
MD5 e6f0eaf102968a63dc9aa3604c11ed17
BLAKE2b-256 7532b4033466d7f0bdc5201349283889c211877d5642c9644d1b4ceb2827c152

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp311-cp311-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp311-cp311-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 d28b33ee5f9cb31b1a3e69296a96c08bdf154b81d4968b26d53216906ea167ee
MD5 db927ca79fd957cc5618be0190a82167
BLAKE2b-256 02af3e0ce6e44efaf611bc62695838a57c5e2a97386aca4ce436813ef8ef3b82

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 16e2ed5d56d6cafd07c43f2f85b7b945b1fbe1106e98378a71a78e680cacd9d0
MD5 2a20528f209fb007f8a57a4171d96b00
BLAKE2b-256 19b3abc36b8a1b581cc96e6b45fa247292f5700402997f95e98546a16b8b7fe7

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 769b37335dbb533dfa0f1ef19eac033018fdd0ac6acc30654cc7aa864eae3c16
MD5 e76e6fa17e0ec5823f8085e3759da3ea
BLAKE2b-256 2eaba61f1c12a8bb4a1e8f8c685387bfabc5a1634e145010691afe3a52afffc8

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 c600249977b7bc7418a3f3f44eff798578cb39a223a15907509ac23d52760949
MD5 c9d25d16fa07d5fefb315f5efb329134
BLAKE2b-256 31ccc1e55deeacc258ba364f95317de880f83a2cba9d6eedfbdd3632fb7824c2

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 7a638df17d528bf235b7db516e644f995d4454c24320326ecee363fdae97eeeb
MD5 22613a0a933c58cbe5b6fd49e4e69597
BLAKE2b-256 ce929ee7ed9844e9702a4b5e2df95e79c6176ccd6785bbddc285499ccdb34288

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 90b3e3b86831025dc9b53b53b920067415a5ec688f2b5169f58b237ff63dc64a
MD5 1133aac5eeb06a3a6e7c5eb63f5cecf5
BLAKE2b-256 4f694239c7a838a23400d2bbda04dc7438eb8dff6fe52ca5a46cc1691d8289eb

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 cf9a1b7a1df757fdaf99c40daab6bc03702db8710053028ce4acf936edaeb6ac
MD5 1e6749554338fd151f97ca731bc2c503
BLAKE2b-256 ce448cfa43968dd1dc338692a7fa694478f4969e02750018d86faab0c8f52e96

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 6fde9834d50a2f45199364187d1c7f9e3e437ee3303174c5f26ddcffc2b180d6
MD5 9ccebf748ae84de6f7d2d5cf4347eb6d
BLAKE2b-256 91eadd87d4fae2a26e345f0a417c8a47c501e78629819ddf4b86bde652e7b0c5

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp39-cp39-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp39-cp39-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 4caaf1e8ff94c2a7b9f4db5e2918a7eb1305023597d802355241f09691b2e885
MD5 92403bf384ed008822475b4f3b55c43e
BLAKE2b-256 f0a5c36803d976c2b1bb721b4b399fdea830ca3ed12abe43f30bd2589ed511c3

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3beed6123b99bb7dbf2eb9a99c1f5170e52ddd340058c0cec7410c8be153c356
MD5 05239eb5f225104a2c5f33f2ba277416
BLAKE2b-256 b81543078d0f3de4fa7a5e086057186d59da926dd75ced2b188820463670197b

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 13d7011cb46545619e73dc067efdbfb334a41626ca4e5864ff7053842e1b311a
MD5 a57ce03d70f3e1465de94db581ef5250
BLAKE2b-256 604ad804a75b275d227ec78afcfeaa2227f1a884428877190c7caad9cb48d18f

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp38-cp38-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 764b78a4ebb462644e245402d37ef02a81b4a5900b23a846dae90f50081d14ec
MD5 5027c57bb7b4f347874f1f4705586802
BLAKE2b-256 a0911ced937f02895535a48d2dada15b9b5c19f168ce77d7bc2cdc90c66067de

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp38-cp38-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp38-cp38-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 587aa0f180f1ecedda67a00f77e293fb778b34f342195e53cf556a3cd0c0d7ba
MD5 684304e0da538ba5c42fb7b36f2ba920
BLAKE2b-256 d18af00ce15f17defbcdf1407cd3d93eed3ab53e661cd64583dd37a962241296

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cc7c4797c307a3275c5356cb439610bd7ce08f0e82ae68ea7b2a8c11a4caca78
MD5 b2c1c47cfe920f0a2b757a432f0a9c39
BLAKE2b-256 1be446cbb862d5c7e54d20bf27960d54531fb7973d93bad3d4e503bafb6ddacf

See more details on using hashes here.

File details

Details for the file nvtabular-23.6.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.6.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 1e71abe17d52105021ed1cb710f96922de6ad322e3e15a06105a2f7b1ee6e228
MD5 6be4383fa08034f3916b15b9e843e79f
BLAKE2b-256 ec5250d5b55541123c0d5f9f7369cd58c2116b84de6ee049525c1cc2420a9d4b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page