Skip to main content

No project description provided

Project description

NVTabular

PyPI LICENSE Documentation

NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS Dask-cuDF library.

NVTabular is a component of NVIDIA Merlin, an open source framework for building and deploying recommender systems and works with the other Merlin components including Merlin Models, HugeCTR and Merlin Systems to provide end-to-end acceleration of recommender systems on the GPU. Extending beyond model training, with NVIDIA’s Triton Inference Server, the feature engineering and preprocessing steps performed on the data during training can be automatically applied to incoming data during inference.

Benefits

When training DL recommender systems, data scientists and machine learning (ML) engineers have been faced with the following challenges:

  • Huge Datasets: Commercial recommenders are trained on huge datasets that may be several terabytes in scale.
  • Complex Data Feature Engineering and Preprocessing Pipelines: Datasets need to be preprocessed and transformed so that they can be used with DL models and frameworks. In addition, feature engineering creates an extensive set of new features from existing ones, requiring multiple iterations to arrive at an optimal solution.
  • Input Bottleneck: Data loading, if not well optimized, can be the slowest part of the training process, leading to under-utilization of high-throughput computing devices such as GPUs.
  • Extensive Repeated Experimentation: The entire data engineering, training, and evaluation process can be repetitious and time consuming, requiring significant computational resources.

NVTabular alleviates these challenges and helps data scientists and ML engineers:

  • process datasets that exceed GPU and CPU memory without having to worry about scale.
  • focus on what to do with the data and not how to do it by using abstraction at the operation level.
  • prepare datasets quickly and easily for experimentation so that more models can be trained.
  • deploy models into production by providing faster dataset transformation

Learn more in the NVTabular core features documentation.

Performance

When running NVTabular on the Criteo 1TB Click Logs Dataset using a single V100 32GB GPU, feature engineering and preprocessing was able to be completed in 13 minutes. Furthermore, when running NVTabular on a DGX-1 cluster with eight V100 GPUs, feature engineering and preprocessing was able to be completed within three minutes. Combined with HugeCTR, the dataset can be processed and a full model can be trained in only six minutes.

The performance of the Criteo DRLM workflow also demonstrates the effectiveness of the NVTabular library. The original ETL script provided in Numpy took over five days to complete. Combined with CPU training, the total iteration time is over one week. By optimizing the ETL code in Spark and running on a DGX-1 equivalent cluster, the time to complete feature engineering and preprocessing was reduced to three hours. Meanwhile, training was completed in one hour.

Installation

NVTabular requires Python version 3.7+. Additionally, GPU support requires:

  • CUDA version 11.0+
  • NVIDIA Pascal GPU or later (Compute Capability >=6.0)
  • NVIDIA driver 450.80.02+
  • Linux or WSL

Installing NVTabular Using Conda

NVTabular can be installed with Anaconda from the nvidia channel by running the following command:

conda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=11.2

Installing NVTabular Using Pip

NVTabular can be installed with pip by running the following command:

pip install nvtabular

Installing NVTabular with Pip causes NVTabular to run on the CPU only and might require installing additional dependencies manually. When you run NVTabular in one of our Docker containers, the dependencies are already installed.

Installing NVTabular with Docker

NVTabular Docker containers are available in the NVIDIA Merlin container repository. The following table summarizes the key information about the containers:

Container Name Container Location Functionality
merlin-hugectr https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr NVTabular, HugeCTR, and Triton Inference
merlin-tensorflow https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow NVTabular, Tensorflow and Triton Inference
merlin-pytorch https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-pytorch NVTabular, PyTorch, and Triton Inference

To use these Docker containers, you'll first need to install the NVIDIA Container Toolkit to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see Support Matrix.

Notebook Examples and Tutorials

We provide a collection of examples to demonstrate feature engineering with NVTabular as Jupyter notebooks:

  • Introduction to NVTabular's High-Level API
  • Advanced workflows with NVTabular
  • NVTabular on CPU
  • Scaling NVTabular to multi-GPU systems

In addition, NVTabular is used in many of our examples in other Merlin libraries:

Feedback and Support

If you'd like to contribute to the library directly, see the Contributing.md. We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this survey.

If you're interested in learning more about how NVTabular works, see our NVTabular documentation. We also have API documentation that outlines the specifics of the available calls within the library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nvtabular-23.5.0.tar.gz (128.0 kB view details)

Uploaded Source

Built Distributions

nvtabular-23.5.0-cp311-cp311-musllinux_1_1_x86_64.whl (807.5 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

nvtabular-23.5.0-cp311-cp311-musllinux_1_1_i686.whl (870.6 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

nvtabular-23.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (305.5 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

nvtabular-23.5.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (311.7 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686

nvtabular-23.5.0-cp310-cp310-musllinux_1_1_x86_64.whl (807.5 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

nvtabular-23.5.0-cp310-cp310-musllinux_1_1_i686.whl (870.5 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

nvtabular-23.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (305.5 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

nvtabular-23.5.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (311.7 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686

nvtabular-23.5.0-cp39-cp39-musllinux_1_1_x86_64.whl (807.6 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

nvtabular-23.5.0-cp39-cp39-musllinux_1_1_i686.whl (870.2 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

nvtabular-23.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (305.3 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

nvtabular-23.5.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (311.7 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

nvtabular-23.5.0-cp38-cp38-musllinux_1_1_x86_64.whl (807.6 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

nvtabular-23.5.0-cp38-cp38-musllinux_1_1_i686.whl (870.5 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

nvtabular-23.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (305.5 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

nvtabular-23.5.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (311.6 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

File details

Details for the file nvtabular-23.5.0.tar.gz.

File metadata

  • Download URL: nvtabular-23.5.0.tar.gz
  • Upload date:
  • Size: 128.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for nvtabular-23.5.0.tar.gz
Algorithm Hash digest
SHA256 1e54767e3b2c91cee790bfabb3f9bed0fdce5e0d7afa72e0c2beae7f586ee604
MD5 86bcafb57fbb1fbe58e60925c411f092
BLAKE2b-256 1980c9e609a86b60ce42f81723cc399dde6fb517283dc9654608648850e06bfd

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 f7b19558bd51a7480837da1631c676e006b62ac41acb46e6abe606ba095bd7c0
MD5 7d24e7e1b07ffe4acdbc2feece9f9c21
BLAKE2b-256 bc6e65b91f19d87dca4d42928176c6d452b219c452efac6fda8e9fc378b068a1

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp311-cp311-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp311-cp311-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 c32fa54de3b74962ba03a2bf4e74c3c6dbbf574619e0deb7f10310499aba667d
MD5 12034c780d7cf8a3bb4a508c9388fd82
BLAKE2b-256 5ed39e95dd87e797d6b5d51213d701b1655be65a9922eb95a2dc7fb1c5ddf027

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 222ac67393889298040227141975ed3e5f1d4996e847dad57a180dff0f54fdbc
MD5 ce7965be22b5b2335fc8a801c83735c5
BLAKE2b-256 f0ab221322d99d68160206a9bff8d5355cf9af78c2ef92ef3f2096378e89666a

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 aa68d0f603ae9dd810c0ffdc922c27a4066631ff62bac38251698b1007f9ab8f
MD5 9ff3d2b0e6110a28f40ab5c7caad575a
BLAKE2b-256 237a89123bfa3c6592abb0d9d60a2392a1a846383047390286c9b17cf82bb656

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 27f0910e4758271f140b2fe797e0288495a5e38086823001a1e2a5cada548c58
MD5 d2d43d5ea4c3b52914733468a9a24a40
BLAKE2b-256 7008fca1f0a299ed475bef749a9b0eac7ae8f8fc11489a7ef6c68efa705089cc

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 1252278fd49869f454a8e6e417ab4216c50ef39ab47ca9f8cc1d0961ec3ee6eb
MD5 ec4700a45c0aa9255222af2e572a6ba7
BLAKE2b-256 d0fb1cc31ab28276e768e52124d56e214898c364191fde803dd78b239a223698

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7ae5e1a894583bab253cd72951a76eab8aad25f5b6c0681ce68e4553138a5fe6
MD5 8480190a8d1b3c89690d0c8bef1804ed
BLAKE2b-256 8bf78463ff4ee905b1ca63db3ca587f4348608b7d5be4ac49460695f349bbb2b

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 4d38701b004b850485872fbe6705b7f5717ca8e5b257952cc95d7a0f45eb1cd0
MD5 62033479dd75f6ca4c5111f58f752e89
BLAKE2b-256 a3454a0178ee0b6ba35ed35dd42f27ae9720cbf5527fe7c175ddc4b32347e74c

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 e4cbb90559b5bf630823852f564f700a89c16e5cbd763039028e7a8a7c21ba59
MD5 26497fc9461715464559d0ea63c6e293
BLAKE2b-256 abffb57408b6fdcdce260146e0f443d71584bf4497777590a4b3fac1b2836b15

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp39-cp39-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp39-cp39-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 36ae2625bac29ed9838041173b7708c28751e3438e5a98a9f9a7fdeb5eab1d75
MD5 0d9f9bcff7f83e500f18b572a603489f
BLAKE2b-256 f16753560f1eb17520adf737a8d6df3781b0d4f6b6d0030a705c9567d4c8a58d

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cfa2b69c2c145d28538a2c35258fe3ad07756753f6cd90b04f629dd1c7a5912f
MD5 0cc0d043853aadf94889b0f92266a11e
BLAKE2b-256 2680370b52bfe92eeedbf089d891ec8a25f20de987bd162fa0be8e66aa5b6269

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 f33b900bfa62443d079aea05142be68c1e74745aad447dbeb752991d45470f26
MD5 2d41f92fffe64147cc85d1aea72f4920
BLAKE2b-256 c0e1f7d6987987e6ea7c9bbdc55cf307bac04f4112887bca61bde2ce13423bec

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp38-cp38-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 ee2a314e6a4714f76f990257428a64f35702e434f3a46d5f6acd890a8467fba0
MD5 2798dd14d8917946030c695fc98b19cd
BLAKE2b-256 f4ee28b1675e0c47833b46293294df1575fd5a4f4bad314c3fd2b9d1cb3b9eaa

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp38-cp38-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp38-cp38-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 5344ed122c0dfd9b2a186f90a2d8b31b2219a75086e0ae2b3dc4033513440a62
MD5 7ec887eee8f14bd3896d107dabdf95b7
BLAKE2b-256 e84415380e09fac23c20cbd8813fe36ba84a64609e6dc6f4ba7a4cf40ec89567

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 398414cec80701980ac05d5e1c82fc2951e729d2d00842a9e6a2722cd2db2adc
MD5 1105905f322892683001563d0f473f4a
BLAKE2b-256 34696735ade43fbcdd76c5a815c7ab9ebe9f9dedfa9fb7e55b3177e0d32d6f8b

See more details on using hashes here.

File details

Details for the file nvtabular-23.5.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.5.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 9f69ca3fee07da00d1cb69bce78b6cf56b87d73d16ee8ecb225300cc515fb5a8
MD5 034acaadcee06f515ae98563fc4c50c7
BLAKE2b-256 a5740979df2ce85d7d6fb845b0364ab7508d9152e25538c9d65884fb811df61f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page