Skip to main content

An efficient Python implementation of the Apriori algorithm.

Project description

Efficient-Apriori Build Status PyPI version Documentation Status Downloads Black

An efficient pure Python implementation of the Apriori algorithm. Works with Python 3.6+.

The apriori algorithm uncovers hidden structures in categorical data. The classical example is a database containing purchases from a supermarket. Every purchase has a number of items associated with it. We would like to uncover association rules such as {bread, eggs} -> {bacon} from the data. This is the goal of association rule learning, and the Apriori algorithm is arguably the most famous algorithm for this problem. This repository contains an efficient, well-tested implementation of the apriori algorithm as descriped in the original paper by Agrawal et al, published in 1994.

Example

Here's a minimal working example. Notice that in every transaction with eggs present, bacon is present too. Therefore, the rule {eggs} -> {bacon} is returned with 100 % confidence.

from efficient_apriori import apriori
transactions = [('eggs', 'bacon', 'soup'),
                ('eggs', 'bacon', 'apple'),
                ('soup', 'bacon', 'banana')]
itemsets, rules = apriori(transactions, min_support=0.5,  min_confidence=1)
print(rules)  # [{eggs} -> {bacon}, {soup} -> {bacon}]

More examples are included below.

Installation

The software is available through GitHub, and through PyPI. You may install the software using pip.

pip install efficient-apriori

Contributing

You are very welcome to scrutinize the code and make pull requests if you have suggestions and improvements. Your submitted code must be PEP8 compliant, and all tests must pass.

More examples

Filtering and sorting association rules

It's possible to filter and sort the returned list of association rules.

from efficient_apriori import apriori
transactions = [('eggs', 'bacon', 'soup'),
                ('eggs', 'bacon', 'apple'),
                ('soup', 'bacon', 'banana')]
itemsets, rules = apriori(transactions, min_support=0.2,  min_confidence=1)

# Print out every rule with 2 items on the left hand side,
# 1 item on the right hand side, sorted by lift
rules_rhs = filter(lambda rule: len(rule.lhs) == 2 and len(rule.rhs) == 1, rules)
for rule in sorted(rules_rhs, key=lambda rule: rule.lift):
  print(rule) # Prints the rule and its confidence, support, lift, ...

Working with large datasets

If you have data that is too large to fit into memory, you may pass a function returning a generator instead of a list. The min_support will most likely have to be a large value, or the algorithm will take very long before it terminates. If you have massive amounts of data, this Python implementation is likely not fast enough, and you should consult more specialized implementations.

def data_generator(filename):
  """
  Data generator, needs to return a generator to be called several times.
  """
  def data_gen():
    with open(filename) as file:
      for line in file:
        yield tuple(k.strip() for k in line.split(','))      

  return data_gen

transactions = data_generator('dataset.csv')
itemsets, rules = apriori(transactions, min_support=0.9, min_confidence=0.6)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

efficient_apriori-1.0.0.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

efficient_apriori-1.0.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file efficient_apriori-1.0.0.tar.gz.

File metadata

  • Download URL: efficient_apriori-1.0.0.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.1

File hashes

Hashes for efficient_apriori-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d25de60bd7c2540515da83838ec9446e97d88bb8e6d78ade9baa7db7e99b8d64
MD5 4acb4c89bb15213740d7b4398b73d3b9
BLAKE2b-256 a1fe938543de87aca9ac8f681a4fc940e604d1010cf7936f6014bdca1924a166

See more details on using hashes here.

File details

Details for the file efficient_apriori-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: efficient_apriori-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.1

File hashes

Hashes for efficient_apriori-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc987a3fc08fd6eb749504aa2f5bffb2266a9e76b53f9a821d2bf5b4978f65e2
MD5 ada9cf17326e6fdd0e6104b0e019c0cc
BLAKE2b-256 5bcbcd06eb983e4a67d9b127df6e3ece87dd7ebea145daa4250929531315bbff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page