Skip to main content

No project description provided

Project description

NVTabular

PyPI LICENSE Documentation

NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS Dask-cuDF library.

NVTabular is a component of NVIDIA Merlin, an open source framework for building and deploying recommender systems and works with the other Merlin components including Merlin Models, HugeCTR and Merlin Systems to provide end-to-end acceleration of recommender systems on the GPU. Extending beyond model training, with NVIDIA’s Triton Inference Server, the feature engineering and preprocessing steps performed on the data during training can be automatically applied to incoming data during inference.

Benefits

When training DL recommender systems, data scientists and machine learning (ML) engineers have been faced with the following challenges:

  • Huge Datasets: Commercial recommenders are trained on huge datasets that may be several terabytes in scale.
  • Complex Data Feature Engineering and Preprocessing Pipelines: Datasets need to be preprocessed and transformed so that they can be used with DL models and frameworks. In addition, feature engineering creates an extensive set of new features from existing ones, requiring multiple iterations to arrive at an optimal solution.
  • Input Bottleneck: Data loading, if not well optimized, can be the slowest part of the training process, leading to under-utilization of high-throughput computing devices such as GPUs.
  • Extensive Repeated Experimentation: The entire data engineering, training, and evaluation process can be repetitious and time consuming, requiring significant computational resources.

NVTabular alleviates these challenges and helps data scientists and ML engineers:

  • process datasets that exceed GPU and CPU memory without having to worry about scale.
  • focus on what to do with the data and not how to do it by using abstraction at the operation level.
  • prepare datasets quickly and easily for experimentation so that more models can be trained.
  • deploy models into production by providing faster dataset transformation

Learn more in the NVTabular core features documentation.

Performance

When running NVTabular on the Criteo 1TB Click Logs Dataset using a single V100 32GB GPU, feature engineering and preprocessing was able to be completed in 13 minutes. Furthermore, when running NVTabular on a DGX-1 cluster with eight V100 GPUs, feature engineering and preprocessing was able to be completed within three minutes. Combined with HugeCTR, the dataset can be processed and a full model can be trained in only six minutes.

The performance of the Criteo DRLM workflow also demonstrates the effectiveness of the NVTabular library. The original ETL script provided in Numpy took over five days to complete. Combined with CPU training, the total iteration time is over one week. By optimizing the ETL code in Spark and running on a DGX-1 equivalent cluster, the time to complete feature engineering and preprocessing was reduced to three hours. Meanwhile, training was completed in one hour.

Installation

NVTabular requires Python version 3.7+. Additionally, GPU support requires:

  • CUDA version 11.0+
  • NVIDIA Pascal GPU or later (Compute Capability >=6.0)
  • NVIDIA driver 450.80.02+
  • Linux or WSL

Installing NVTabular Using Conda

NVTabular can be installed with Anaconda from the nvidia channel by running the following command:

conda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=11.2

Installing NVTabular Using Pip

NVTabular can be installed with pip by running the following command:

pip install nvtabular

Installing NVTabular with Pip causes NVTabular to run on the CPU only and might require installing additional dependencies manually. When you run NVTabular in one of our Docker containers, the dependencies are already installed.

Installing NVTabular with Docker

NVTabular Docker containers are available in the NVIDIA Merlin container repository. The following table summarizes the key information about the containers:

Container Name Container Location Functionality
merlin-hugectr https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr NVTabular, HugeCTR, and Triton Inference
merlin-tensorflow https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow NVTabular, Tensorflow and Triton Inference
merlin-pytorch https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-pytorch NVTabular, PyTorch, and Triton Inference

To use these Docker containers, you'll first need to install the NVIDIA Container Toolkit to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see Support Matrix.

Notebook Examples and Tutorials

We provide a collection of examples to demonstrate feature engineering with NVTabular as Jupyter notebooks:

  • Introduction to NVTabular's High-Level API
  • Advanced workflows with NVTabular
  • NVTabular on CPU
  • Scaling NVTabular to multi-GPU systems

In addition, NVTabular is used in many of our examples in other Merlin libraries:

Feedback and Support

If you'd like to contribute to the library directly, see the Contributing.md. We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this survey.

If you're interested in learning more about how NVTabular works, see our NVTabular documentation. We also have API documentation that outlines the specifics of the available calls within the library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nvtabular-23.4.0.tar.gz (127.9 kB view details)

Uploaded Source

Built Distributions

nvtabular-23.4.0-cp311-cp311-musllinux_1_1_x86_64.whl (806.6 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

nvtabular-23.4.0-cp311-cp311-musllinux_1_1_i686.whl (869.5 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

nvtabular-23.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (304.7 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

nvtabular-23.4.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (310.3 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686

nvtabular-23.4.0-cp310-cp310-musllinux_1_1_x86_64.whl (806.6 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

nvtabular-23.4.0-cp310-cp310-musllinux_1_1_i686.whl (869.4 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

nvtabular-23.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (304.7 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

nvtabular-23.4.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (310.3 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686

nvtabular-23.4.0-cp39-cp39-musllinux_1_1_x86_64.whl (806.8 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

nvtabular-23.4.0-cp39-cp39-musllinux_1_1_i686.whl (869.4 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

nvtabular-23.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (304.5 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

nvtabular-23.4.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (310.4 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

nvtabular-23.4.0-cp38-cp38-musllinux_1_1_x86_64.whl (806.7 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

nvtabular-23.4.0-cp38-cp38-musllinux_1_1_i686.whl (869.4 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

nvtabular-23.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (304.6 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

nvtabular-23.4.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (310.0 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

File details

Details for the file nvtabular-23.4.0.tar.gz.

File metadata

  • Download URL: nvtabular-23.4.0.tar.gz
  • Upload date:
  • Size: 127.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for nvtabular-23.4.0.tar.gz
Algorithm Hash digest
SHA256 4752c9ae3ed8086981f25a04ab4f5f5a925fc1f515c4dfee701ddd06efceefb7
MD5 cd8ccdeb1ad04e60f4a09fcabff0bc8a
BLAKE2b-256 e757fc7496f3dc8efde8ba98645a792f932536221780c108da6de1d5d075ffc4

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 27ff3ca971931051d8f6e01d0e0ca5c766bea4531acceb5bf813546afb76e7c1
MD5 a03cd1652b1225d280771fece6a36fde
BLAKE2b-256 7de73be7471f799bcfdf717e4998b5dc7a4bea22573b9a4083e87afbd7a0ec9d

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp311-cp311-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp311-cp311-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 b08e904d47b240009e79914bdf305b6f6f0f0122c072e23a6753123552a1c498
MD5 72bf6f7f029f903d2acae392d3f7c136
BLAKE2b-256 0742d4a3256ce473c3d47112a7bee1cfa082cd24f9972ddb60b340b26e2cf2cf

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 59d9af5dce4103d9e6cf1326d026f6dda57bbffcc16e596f5b5abf0f246eda4b
MD5 d70cd3266fd596b3da9970fae33cc4f9
BLAKE2b-256 5b4e0f04fd0ead164482aad9021e91fe98e0071d969066441dcef9aa004254f4

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 94924739030104674b93296fb2f29fbff4495555725665e1832cb87377641a60
MD5 571a7897f503af9d5b21cb6eec5ac285
BLAKE2b-256 1a3250dac759c07b026eed2be5f420cbd9baad617801512afa25cc691f992df5

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 e3fc0fbfb76ea5e7666c9a5e32ec60c5898f53e55de9290fffe27f85a412ee04
MD5 81a285c111eb5b72e3005916bcfe412d
BLAKE2b-256 1f9e2fa0a6cfb288f40d161d0f310fd43acac25197a6de4798d43384b870edd9

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 fcc812f84c50de1dd4ab2175247e75da5deed90518af81b22baef3a81652bc80
MD5 c694c2314503f5a1f046d742c91588f2
BLAKE2b-256 803dad52eb27c3825f1ddc251f976f76c7fc38f55d032ebd053d8f77473480ee

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 06a168f407ceddb4355d7a7767b81272ea82609dcd0067af68e8d44506993e7a
MD5 5c61517ec898628f90b0c14a268338c9
BLAKE2b-256 abc9718bb1b084a6f26dd0273b70922ad38a3a499e23b546c12b744a20778006

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 647f5851555964cb93f02bae4f14b8c7da1bfba8abd381cfe0cfc4690f5f517b
MD5 0ad8b0b83df072631d35b8172deb23da
BLAKE2b-256 52ef43bb7ca53448f710102700837c4ce04397667584cfa5c61098dd5581a3fb

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 7dbeceb465c2834d9d462b080cfa049210eba3140d5484ec1383d1d8b623cfa4
MD5 a9f6b0db58f8e6be6943036ac9d79da5
BLAKE2b-256 5b778c6a1a526f5c35d5b973c1fcdd85bf8b1a1c7cd83bd4fc3b8ba67ccd88ca

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp39-cp39-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp39-cp39-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 fa8af7ba0b24df4086db7cd02b4737988d966b22dd562e2861664370c68df168
MD5 57728bdaa25f471df9f317fc398c13dc
BLAKE2b-256 2583a4a98c3da33e45141164d84f12b77412f4058adac99294b5afd0a3e86059

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5c8c4dae8f55fb11b02cddedafa5e3cfb84461443d227079e8ce364caec314b3
MD5 c66638f6e1ec0adac5a1e4463edbd2e2
BLAKE2b-256 ed44a7ee7e6b8486912e9502a91be0a6b7e69952780c0ad9fce658cc81246db1

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 ee9ab2cd4cfb090f134a657b6dd61843e07b971cffaaae11f0cc0a05aaf31814
MD5 fb7a29db6766b6e7fce4a42e289d3227
BLAKE2b-256 d63f7a801f386cb67facb0f7b4abe6c0760e70d3fcb1b00eacc60e0450a4cb82

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp38-cp38-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 84b0407ccc712718898c4e11dc08dbf820cd8c924eda18962f578adc1b259f6e
MD5 5b39c42cd29d8cf11a2f46e999a08e6c
BLAKE2b-256 f62e2e039ca6e4795c6ba2c20c74e262cb534f363a241ac967c434558cb44710

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp38-cp38-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp38-cp38-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 3580256a92f1a2a1a6af12e39619a39c13417877438c8ea452ef8996ea75a5c1
MD5 3cfc7c2efc3b29e6e75b77068a80b31d
BLAKE2b-256 c0b30347905845a770789a0f2c6757ee11aeb420d09efe40b1e493a636ffc774

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e6ac975499f50d979d4300c807f7033ada6201a2f840d0d213c69b52ba0539f9
MD5 8c89d902f9138cdf9722b1ef513e3a25
BLAKE2b-256 5214d9f2bb02284f0efcdd14d2cb8909ad4d5e15bf349dfc128d09a1ffe1ec2f

See more details on using hashes here.

File details

Details for the file nvtabular-23.4.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nvtabular-23.4.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 805fd46d3ac656b9bc141bc27bb0678af33a22211d94078c59fd8b9c77b0f39c
MD5 1e8cbf6ef235e9a7cba048a5631cf822
BLAKE2b-256 bcbb8275e461a1df7456113b7eb0eb51c5b9ac871c029378e31d492de96a56ef

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page