Skip to main content

Recommender System Utilities

Project description

Recommender Utilities

This package (reco_utils) contains functions to simplify common tasks used when developing and evaluating recommender systems. A short description of the sub-modules is provided below. For more details about what functions are available and how to use them, please review the doc-strings provided with the code.

See the online documentation.

AzureML

The AzureML submodule contains utilities to train, tune and operationalize recommendation systems at scale using AzureML.

Common

This submodule contains high-level utilities for defining constants used in most algorithms as well as helper functions for managing aspects of different frameworks: gpu, spark, jupyter notebook.

Dataset

Dataset includes helper functions for interacting with Azure Cosmos databases, pulling different datasets and formatting them appropriately as well as utilities for splitting data for training / testing.

Data Loading

There are dataloaders for several datasets. For example, the movielens module will allow you to load a dataframe in pandas or spark formats from the MovieLens dataset, with sizes of 100k, 1M, 10M, or 20M to test algorithms and evaluate performance benchmarks.

df = movielens.load_pandas_df(size="100k")

Splitting Techniques

Currently three methods are available for splitting datasets. All of them support splitting by user or item and filtering out minimal samples (for instance users that have not rated enough item, or items that have not been rated by enough users).

  • Random: this is the basic approach where entries are randomly assigned to each group based on the ratio desired
  • Chronological: this uses provided timestamps to order the data and selects a cut-off time that will split the desired ratio of data to train before that time and test after that time
  • Stratified: this is similar to random sampling, but the splits are stratified, for example if the datasets are split by user, the splitting approach will attempt to maintain the same set of items used in both training and test splits. The converse is true if splitting by item.

Evaluation

The evaluation submodule includes functionality for performing hyperparameter sweeps as well as calculating common recommender metrics directly in python or in a Spark environment using pyspark.

Currently available metrics include:

  • Root Mean Squared Error
  • Mean Absolute Error
  • R2
  • Explained Variance
  • Precision at K
  • Recall at K
  • Normalized Discounted Cumulative Gain at K
  • Mean Average Precision at K
  • Area Under Curve
  • Logistic Loss

Recommender

The recommender submodule contains implementations of various algorithms that can be used in addition to external packages to evaluate and develop new recommender system approaches. A description of all the algorithms can be found on this table. Next a list of the algorithm utilities:

  • Cornac
  • DeepRec (includes xDeepFM and DKN)
  • FastAI
  • LightGBM
  • NCF
  • NewsRec (includes LSTUR, NAML NPA and NRMS)
  • RBM
  • RLRMC
  • SAR
  • Surprise
  • Vowpal Wabbit (VW)
  • Wide&Deep

Tuning

This submodule contains utilities for performing hyperparameter tuning.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pre_reco_utils-2021.2.4.tar.gz (146.4 kB view details)

Uploaded Source

Built Distribution

pre_reco_utils-2021.2.4-py3-none-any.whl (187.7 kB view details)

Uploaded Python 3

File details

Details for the file pre_reco_utils-2021.2.4.tar.gz.

File metadata

  • Download URL: pre_reco_utils-2021.2.4.tar.gz
  • Upload date:
  • Size: 146.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2

File hashes

Hashes for pre_reco_utils-2021.2.4.tar.gz
Algorithm Hash digest
SHA256 0d5e95d6465b485b6fb3b081fbdbd16676e9ff5c7c4e7cc74e774d0a69075c4e
MD5 32cd3bd3fe777bca117f8f5c1e18494e
BLAKE2b-256 a0238ca2758c2ca6b5c9a515c2e987b1a4c031d9e23756a0815525287fac58d9

See more details on using hashes here.

File details

Details for the file pre_reco_utils-2021.2.4-py3-none-any.whl.

File metadata

  • Download URL: pre_reco_utils-2021.2.4-py3-none-any.whl
  • Upload date:
  • Size: 187.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2

File hashes

Hashes for pre_reco_utils-2021.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7e6132a4bcafcadba94ba51aaf49e701ef0d0b4b6958f6d97cf43edf11564c4a
MD5 093336b1fac24412450106a42934923e
BLAKE2b-256 b76b2f52ef317abb907a85f7d92341bb3273a8d2d6c36c243225701edf2c4d88

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page