Skip to main content

An efficient Python implementation of the Apriori algorithm.

Project description

Efficient-Apriori Build Status PyPI version Documentation Status Downloads Black

An efficient pure Python implementation of the Apriori algorithm. Works with Python 3.6+.

The apriori algorithm uncovers hidden structures in categorical data. The classical example is a database containing purchases from a supermarket. Every purchase has a number of items associated with it. We would like to uncover association rules such as {bread, eggs} -> {bacon} from the data. This is the goal of association rule learning, and the Apriori algorithm is arguably the most famous algorithm for this problem. This repository contains an efficient, well-tested implementation of the apriori algorithm as described in the original paper by Agrawal et al, published in 1994.

The code is stable and in widespread use. It's cited in the book "Mastering Machine Learning Algorithms" by Bonaccorso.

The code is fast. See timings in this PR.

Example

Here's a minimal working example. Notice that in every transaction with eggs present, bacon is present too. Therefore, the rule {eggs} -> {bacon} is returned with 100 % confidence.

from efficient_apriori import apriori
transactions = [('eggs', 'bacon', 'soup'),
                ('eggs', 'bacon', 'apple'),
                ('soup', 'bacon', 'banana')]
itemsets, rules = apriori(transactions, min_support=0.5, min_confidence=1)
print(rules)  # [{eggs} -> {bacon}, {soup} -> {bacon}]

If your data is in a pandas DataFrame, you must convert it to a list of tuples. Do you have missing values, or does the algorithm run for a long time? See this comment. More examples are included below.

Installation

The software is available through GitHub, and through PyPI. You may install the software using pip.

pip install efficient-apriori

Contributing

You are very welcome to scrutinize the code and make pull requests if you have suggestions and improvements. Your submitted code must be PEP8 compliant, and all tests must pass. Contributors: CRJFisher

More examples

Filtering and sorting association rules

It's possible to filter and sort the returned list of association rules.

from efficient_apriori import apriori
transactions = [('eggs', 'bacon', 'soup'),
                ('eggs', 'bacon', 'apple'),
                ('soup', 'bacon', 'banana')]
itemsets, rules = apriori(transactions, min_support=0.2, min_confidence=1)

# Print out every rule with 2 items on the left hand side,
# 1 item on the right hand side, sorted by lift
rules_rhs = filter(lambda rule: len(rule.lhs) == 2 and len(rule.rhs) == 1, rules)
for rule in sorted(rules_rhs, key=lambda rule: rule.lift):
  print(rule)  # Prints the rule and its confidence, support, lift, ...

Transactions with IDs

If you need to know which transactions occurred in the frequent itemsets, set the output_transaction_ids parameter to True. This changes the output to contain ItemsetCount objects for each itemset. The objects have a members property containing is the set of ids of frequent transactions as well as a count property. The ids are the enumeration of the transactions in the order they appear.

from efficient_apriori import apriori
transactions = [('eggs', 'bacon', 'soup'),
                ('eggs', 'bacon', 'apple'),
                ('soup', 'bacon', 'banana')]
itemsets, rules = apriori(transactions, output_transaction_ids=True)
print(itemsets)
# {1: {('bacon',): ItemsetCount(itemset_count=3, members={0, 1, 2}), ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

efficient_apriori-2.0.1.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

efficient_apriori-2.0.1-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file efficient_apriori-2.0.1.tar.gz.

File metadata

  • Download URL: efficient_apriori-2.0.1.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for efficient_apriori-2.0.1.tar.gz
Algorithm Hash digest
SHA256 ab0ec6df70c0386d378dc80494393b1403c44f89b5115091cdbb89c066ad1616
MD5 8f4cd29409447a8b9a010021034dd67b
BLAKE2b-256 24321f9e6c53b33ecc7f90f13e1599d9e2bc08e35d83f68129931a08d77953ef

See more details on using hashes here.

File details

Details for the file efficient_apriori-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: efficient_apriori-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for efficient_apriori-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6b769f68495a49e3110003989094c73a06655cf6022b8ff6fb2f8f51c133e01c
MD5 19affc47ddf3c1c94593836fd9ec4f08
BLAKE2b-256 205ba93622c9cc91fc4fb5c29bfeb8689ec4bf1e2b3d0f3579f34975051e6716

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page