Collaborative Filtering for Implicit Feedback Datasets
Project description
Implicit
Fast Python Collaborative Filtering for Implicit Datasets.
This project provides fast Python implementations of several different popular recommendation algorithms for implicit feedback datasets:
-
Alternating Least Squares as described in the papers Collaborative Filtering for Implicit Feedback Datasets and Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering.
-
Item-Item Nearest Neighbour models using Cosine, TFIDF or BM25 as a distance metric.
All models have multi-threaded training routines, using Cython and OpenMP to fit the models in parallel among all available CPU cores. In addition, the ALS and BPR models both have custom CUDA kernels - enabling fitting on compatible GPU's. Approximate nearest neighbours libraries such as Annoy, NMSLIB and Faiss can also be used by Implicit to speed up making recommendations.
Installation
There are binary packages on conda-forge for Linux, Windows and OSX. These can be installed with:
conda install -c conda-forge implicit
There are also GPU enabled packages on conda-forge for x86_64 Linux systems using either CUDA 11.0, 11.1 or 11.2. The GPU packages can be installed with:
conda install -c conda-forge implicit implicit-proc=*=gpu
There is also an sdist package on PyPi. This package can be installed with:
pip install implicit
Note that installing with pip requires a C++ compiler to be installed on your system, since this method will build implicit from source.
Basic Usage
import implicit
# initialize a model
model = implicit.als.AlternatingLeastSquares(factors=50)
# train the model on a sparse matrix of user/item/confidence weights
model.fit(user_item_data)
# recommend items for a user
recommendations = model.recommend(userid, user_item_data[userid])
# find related items
related = model.similar_items(itemid)
The examples folder has a program showing how to use this to compute similar artists on the last.fm dataset.
For more information see the documentation.
Articles about Implicit
These blog posts describe the algorithms that power this library:
- Finding Similar Music with Matrix Factorization
- Faster Implicit Matrix Factorization
- Implicit Matrix Factorization on the GPU
- Approximate Nearest Neighbours for Recommender Systems
- Distance Metrics for Fun and Profit
There are also several other blog posts about using Implicit to build recommendation systems:
- Recommending GitHub Repositories with Google BigQuery and the implicit library
- Intro to Implicit Matrix Factorization: Classic ALS with Sketchfab Models
- A Gentle Introduction to Recommender Systems with Implicit Feedback.
Requirements
This library requires SciPy version 0.16 or later and Python version 3.6 or later.
Running on OSX requires an OpenMP compiler, which can be installed with homebrew: brew install gcc
.
GPU Support requires at least version 11 of the NVidia CUDA Toolkit. The build will use the nvcc
compiler
that is found on the path, but this can be overridden by setting the CUDAHOME environment variable
to point to your cuda installation.
This library has been tested with Python 3.6, 3.7, 3.8 and 3.9 on Ubuntu, OSX and Windows.
Benchmarks
Simple benchmarks comparing the ALS fitting time versus Spark can be found here.
Optimal Configuration
I'd recommend configuring SciPy to use Intel's MKL matrix libraries. One easy way of doing this is by installing the Anaconda Python distribution.
For systems using OpenBLAS, I highly recommend setting 'export OPENBLAS_NUM_THREADS=1'. This disables its internal multithreading ability, which leads to substantial speedups for this package. Likewise for Intel MKL, setting 'export MKL_NUM_THREADS=1' should also be set.
Released under the MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file implicit-0.5.0.tar.gz
.
File metadata
- Download URL: implicit-0.5.0.tar.gz
- Upload date:
- Size: 71.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7d3981e3fbcb5e4755469e7a86cbf4044b8f4fa1aacfe6ef6f9e6f06c4f684f |
|
MD5 | 5818d7f513b57e8f1b0d0b27f5d89a6c |
|
BLAKE2b-256 | 91c1ff6dcc22dd6071efdff4b3198849c488fe87ba79149d4b21fdaa76417644 |