Skip to main content

Collaborative Filtering for Implicit Datasets

Project description

Implicit

Build Status Windows Build Status

Fast Python Collaborative Filtering for Implicit Datasets.

This project provides fast Python implementations of several different popular recommendation algorithms for implicit feedback datasets:

All models have multi-threaded training routines, using Cython and OpenMP to fit the models in parallel among all available CPU cores. In addition, the ALS and BPR models both have custom CUDA kernels - enabling fitting on compatible GPU's. Approximate nearest neighbours libraries such as Annoy, NMSLIB and Faiss can also be used by Implicit to speed up making recommendations.

To install:

pip install implicit

Basic usage:

import implicit

# initialize a model
model = implicit.als.AlternatingLeastSquares(factors=50)

# train the model on a sparse matrix of item/user/confidence weights
model.fit(item_user_data)

# recommend items for a user
user_items = item_user_data.T.tocsr()
recommendations = model.recommend(userid, user_items)

# find related items
related = model.similar_items(itemid)

The examples folder has a program showing how to use this to compute similar artists on the last.fm dataset.

For more information see the documentation.

Articles about Implicit

These blog posts describe the algorithms that power this library:

There are also several other blog posts about using Implicit to build recommendation systems:

Requirements

This library requires SciPy version 0.16 or later. Running on OSX requires an OpenMP compiler, which can be installed with homebrew: brew install gcc. Running on Windows requires Python 3.5+.

GPU Support requires at least version 8 of the NVidia CUDA Toolkit. The build will use the nvcc compiler that is found on the path, but this can be overriden by setting the CUDAHOME enviroment variable to point to your cuda installation.

This library has been tested with Python 2.7, 3.5, 3.6 and 3.7 on Ubuntu and OSX, and tested with Python 3.5 and 3.6 on Windows.

Benchmarks

Simple benchmarks comparing the ALS fitting time versus Spark and QMF can be found here.

Optimal Configuration

I'd recommend configuring SciPy to use Intel's MKL matrix libraries. One easy way of doing this is by installing the Anaconda Python distribution.

For systems using OpenBLAS, I highly recommend setting 'export OPENBLAS_NUM_THREADS=1'. This disables its internal multithreading ability, which leads to substantial speedups for this package. Likewise for Intel MKL, setting 'export MKL_NUM_THREADS=1' should also be set.

Released under the MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

implicit-0.4.2.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

implicit-0.4.2-py3.7-macosx-10.7-x86_64.egg (1.1 MB view details)

Uploaded Source

File details

Details for the file implicit-0.4.2.tar.gz.

File metadata

  • Download URL: implicit-0.4.2.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for implicit-0.4.2.tar.gz
Algorithm Hash digest
SHA256 fb66ab4db428e02caae87b312d409b53d7c40c9e8258e642f228cc3cc550ed99
MD5 ff2e2d39a3ea855cd5bced6ff9a23498
BLAKE2b-256 5ad86b4f1374ffa2647b72ac76960c71b984c6f3238090359fb419d03827d87a

See more details on using hashes here.

File details

Details for the file implicit-0.4.2-py3.7-macosx-10.7-x86_64.egg.

File metadata

  • Download URL: implicit-0.4.2-py3.7-macosx-10.7-x86_64.egg
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for implicit-0.4.2-py3.7-macosx-10.7-x86_64.egg
Algorithm Hash digest
SHA256 18b7eee38ff82acc982501b7b6e107a9380ca5703b320544164d1fa89014815f
MD5 2465b8ff82e12da3d613815eeb6d5234
BLAKE2b-256 1ad2a0a75ea01c23bfc4c1f1402f3d5a024d80958d6ae0ce6788d658305f5436

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page