Skip to main content

No project description provided

Project description

NVTabular

PyPI LICENSE Documentation

NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS Dask-cuDF library.

NVTabular is a component of NVIDIA Merlin, an open source framework for building and deploying recommender systems and works with the other Merlin components including Merlin Models, HugeCTR and Merlin Systems to provide end-to-end acceleration of recommender systems on the GPU. Extending beyond model training, with NVIDIA’s Triton Inference Server, the feature engineering and preprocessing steps performed on the data during training can be automatically applied to incoming data during inference.

Benefits

When training DL recommender systems, data scientists and machine learning (ML) engineers have been faced with the following challenges:

  • Huge Datasets: Commercial recommenders are trained on huge datasets that may be several terabytes in scale.
  • Complex Data Feature Engineering and Preprocessing Pipelines: Datasets need to be preprocessed and transformed so that they can be used with DL models and frameworks. In addition, feature engineering creates an extensive set of new features from existing ones, requiring multiple iterations to arrive at an optimal solution.
  • Input Bottleneck: Data loading, if not well optimized, can be the slowest part of the training process, leading to under-utilization of high-throughput computing devices such as GPUs.
  • Extensive Repeated Experimentation: The entire data engineering, training, and evaluation process can be repetitious and time consuming, requiring significant computational resources.

NVTabular alleviates these challenges and helps data scientists and ML engineers:

  • process datasets that exceed GPU and CPU memory without having to worry about scale.
  • focus on what to do with the data and not how to do it by using abstraction at the operation level.
  • prepare datasets quickly and easily for experimentation so that more models can be trained.
  • deploy models into production by providing faster dataset transformation

Learn more in the NVTabular core features documentation.

Performance

When running NVTabular on the Criteo 1TB Click Logs Dataset using a single V100 32GB GPU, feature engineering and preprocessing was able to be completed in 13 minutes. Furthermore, when running NVTabular on a DGX-1 cluster with eight V100 GPUs, feature engineering and preprocessing was able to be completed within three minutes. Combined with HugeCTR, the dataset can be processed and a full model can be trained in only six minutes.

The performance of the Criteo DRLM workflow also demonstrates the effectiveness of the NVTabular library. The original ETL script provided in Numpy took over five days to complete. Combined with CPU training, the total iteration time is over one week. By optimizing the ETL code in Spark and running on a DGX-1 equivalent cluster, the time to complete feature engineering and preprocessing was reduced to three hours. Meanwhile, training was completed in one hour.

Installation

NVTabular requires Python version 3.7+. Additionally, GPU support requires:

  • CUDA version 11.0+
  • NVIDIA Pascal GPU or later (Compute Capability >=6.0)
  • NVIDIA driver 450.80.02+
  • Linux or WSL

Installing NVTabular Using Conda

NVTabular can be installed with Anaconda from the nvidia channel by running the following command:

conda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=11.2

Installing NVTabular Using Pip

NVTabular can be installed with pip by running the following command:

pip install nvtabular

Installing NVTabular with Pip causes NVTabular to run on the CPU only and might require installing additional dependencies manually. When you run NVTabular in one of our Docker containers, the dependencies are already installed.

Installing NVTabular with Docker

NVTabular Docker containers are available in the NVIDIA Merlin container repository. The following table summarizes the key information about the containers:

Container Name Container Location Functionality
merlin-hugectr https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr NVTabular, HugeCTR, and Triton Inference
merlin-tensorflow https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow NVTabular, Tensorflow and Triton Inference
merlin-pytorch https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-pytorch NVTabular, PyTorch, and Triton Inference

To use these Docker containers, you'll first need to install the NVIDIA Container Toolkit to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see Support Matrix.

Notebook Examples and Tutorials

We provide a collection of examples to demonstrate feature engineering with NVTabular as Jupyter notebooks:

  • Introduction to NVTabular's High-Level API
  • Advanced workflows with NVTabular
  • NVTabular on CPU
  • Scaling NVTabular to multi-GPU systems

In addition, NVTabular is used in many of our examples in other Merlin libraries:

Feedback and Support

If you'd like to contribute to the library directly, see the Contributing.md. We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this survey.

If you're interested in learning more about how NVTabular works, see our NVTabular documentation. We also have API documentation that outlines the specifics of the available calls within the library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nvtabular-23.8.0.tar.gz (114.3 kB view details)

Uploaded Source

Built Distributions

nvtabular-23.8.0-cp311-cp311-musllinux_1_1_x86_64.whl (792.6 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

nvtabular-23.8.0-cp311-cp311-musllinux_1_1_i686.whl (853.0 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

nvtabular-23.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (285.6 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

nvtabular-23.8.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (295.2 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686

nvtabular-23.8.0-cp310-cp310-musllinux_1_1_x86_64.whl (792.0 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

nvtabular-23.8.0-cp310-cp310-musllinux_1_1_i686.whl (851.5 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

nvtabular-23.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (284.5 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

nvtabular-23.8.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (293.4 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686

nvtabular-23.8.0-cp39-cp39-musllinux_1_1_x86_64.whl (793.1 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

nvtabular-23.8.0-cp39-cp39-musllinux_1_1_i686.whl (852.4 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

nvtabular-23.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (284.4 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

nvtabular-23.8.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (294.2 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

nvtabular-23.8.0-cp38-cp38-musllinux_1_1_x86_64.whl (792.0 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

nvtabular-23.8.0-cp38-cp38-musllinux_1_1_i686.whl (851.4 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

nvtabular-23.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (284.3 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

nvtabular-23.8.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (292.8 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

File details

Details for the file nvtabular-23.8.0.tar.gz.

File metadata

  • Download URL: nvtabular-23.8.0.tar.gz
  • Upload date:
  • Size: 114.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for nvtabular-23.8.0.tar.gz
Algorithm Hash digest
SHA256 fef12f97ee5b3b2b3e9d71a1835bc08885893b908f5f44be5a60a756e867d92b
MD5 b7117c31ecd0be1c3b9b386a9b2bdace
BLAKE2b-256 d712b9eaf8e8e2ecb9f5b9191266769dcec02cfb6efafb09a0f2a10b773dc549

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 6540a22105da3a470e05a8026d1eed8684c2d3f2253f7eb3a427b6e95087ac50
MD5 9530a2c0ee4fa504c0804c06c6c0b53b
BLAKE2b-256 74799cbb204d39fe9014f75cf0aa6816d83539f71e8517e84bf3a91dfa71c477

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp311-cp311-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp311-cp311-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 6b013a1653093120a371ed26681eb63e23495ec67a1ead7a43a11ad7232f3581
MD5 556437e44cc207b48626c4a9c43d03b8
BLAKE2b-256 58d0f3539483f68d613fcb7f3306410eb0fe80254f8aa1fc7fcc8c7da7486363

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2932a15a67d1384230a1eea00630fa47e5453897d043db917b4535b91fb98b49
MD5 cebb17ef40923c02a52d693981f6afec
BLAKE2b-256 a0a04cf33b5401fd91db74ca6f1f839866c8033c06b1351649b007d9b23cc587

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 46bd42e38a15cc004d90867d7c96ae16dc57ce1360139b924eb7401b43996c42
MD5 cf8bff7cd8c5acb5bfa72ea5be9a5973
BLAKE2b-256 d9d1e60744ffc0a54e8a45a380c88d604a6be17d26fa138c8435fef34ef4dc87

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 047ffea09af3afa087abb191635d71bd07d5b489452b7c0562a00313c3b28a7b
MD5 cf107daa76c7f7a7ee88b67ea07a1077
BLAKE2b-256 1cc9913fe27a12d97c29d18501c104a4b9991d5c7ac3dcb84ac75116a234a64a

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 22e3d9d6fc0d71745c79dcff6b304315c4ac563f1054185b446006139b014f6d
MD5 65f9e4a5880dfeb8ff3207b8fada752c
BLAKE2b-256 af7ad25cf348c8666cd2bba2a3ac0ab43a7daf7d9b84efb1db7a5806636858bf

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b9551525aef245a3cd79c32dea0f4b2de44ba587cddccf9151d1e93569bc572e
MD5 af31834f414dd895da59fb05cc9316f8
BLAKE2b-256 95ed8219cb32c1d7df0db2d3afe602d38d2788ba5e82d290363f25d3700b7cc8

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 9f774c83a30898830583f439ee5b11f380be2b6a3f850746d0df5bb654f20bbd
MD5 7b1d8424d1f18de74a22eaf56deeecfa
BLAKE2b-256 cbaba642af7e3c3cb64d89fffc63fefc6cf47c3e995b6ca03d6e35df3890e7f9

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 1c72c624b1317e37ddbbf8cfe66e9dc9e8627d8f29de450d1d2e1de89c4a7c75
MD5 694c4f4defdaa97a32f37f2cf5b0dae3
BLAKE2b-256 dbb083a362867d246e71d77effc968b99b5e8e90908fe318a41c41cc069c93b7

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp39-cp39-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp39-cp39-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 980fbbfdae64b5255edafff6716efd30c4220eff14d438636c2c6143290f5d98
MD5 bbfe658cfd3b7863e8a85fe7def5fc31
BLAKE2b-256 dbdd1a40fdeb0f95382a9ed6ab479b60d04c49d4612329fea4218ea741310781

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cc33c5e7445c690ee766bf2c6995c378f6cb3ce98b1f61fe997f040ad608d798
MD5 c63e7937c9da1c7693dd34df1f3440d6
BLAKE2b-256 fd45462a2382dfbabc031ac83b5bcdeff28e28de8df720fdf84e86757c51bfa2

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 a44e25c0622fe414bae2dcf59a62661411246790328d7fc18042348e69aa9c12
MD5 4136172fab1f9a6289f128304cc2d850
BLAKE2b-256 988d22a4c0f92d90c24a9da0c04d87c91711dc5b84e96f059d8cb2a0f5dc1d1a

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp38-cp38-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 88be66f092325f09ddb8775975807441250d05bdcf3b144b7f94cf61f3834c13
MD5 2df295fcc7f06ed9781c88bfbfb39389
BLAKE2b-256 69bc87a92798826b28657851f8390ef2d0eb6b62659ff8813c66badcb1eb4697

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp38-cp38-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp38-cp38-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 7a0cc0e69ab26786d0553d345eb22fc837c5da3626129d091fcd996e1ee8e488
MD5 fe161357f067d8dafbd0641650af965f
BLAKE2b-256 b7789d7d81ec915b0267675c41797df0b85cbf1a12a9f35882fb2c6d7e17a2e6

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 96fe7820ba21b55ff80e2d13ad3c8676ba57bb68311f08aea794073c7915f000
MD5 2e62eaa6f62651085e735cc9d52d320c
BLAKE2b-256 6929c3ab9f43163dfe9f8ba6ec157497d77a97803fff171f78b8196aa5e4eaeb

See more details on using hashes here.

File details

Details for the file nvtabular-23.8.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.8.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 443a818094838d6f368a53341efa2796b32762a516b1e726929bdeb27971fc24
MD5 53b43f52e9ae065ffe209b3ec1ba2903
BLAKE2b-256 8df1e459dcbac9dab8fa012c5865ce245acb54ee946f5462f7532bb4ab5bd0d5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page