Skip to main content

An efficient Python implementation of the Apriori algorithm.

Project description

# Efficient-Apriori [![Build Status](https://travis-ci.com/tommyod/Efficient-Apriori.svg?branch=master)](https://travis-ci.com/tommyod/Efficient-Apriori)

An efficient pure Python implementation of the Apriori algorithm.

The apriori algorithm uncovers hidden structures in categorical data.
The classical example is a database containing purchases from a supermarket.
Every purchase has a number of items associated with it.
We would like to uncover association rules such as `{bread, eggs} -> {bacon}` from the data.
This is the goal of [association rule learning](https://en.wikipedia.org/wiki/Association_rule_learning), and the [Apriori algorithm](https://en.wikipedia.org/wiki/Apriori_algorithm) is arguably the most famous algorithm for this problem.
This repository contains an efficient, well-tested implementation of the apriori algorithm as descriped in the [original paper](https://www.macs.hw.ac.uk/~dwcorne/Teaching/agrawal94fast.pdf) by Agrawal et al, published in 1994.

## Example

Here's a minimal working example.
Notice that in every transaction with `eggs` present, `bacon` is present too.
Therefore, the rule `{eggs} -> {bacon}` is returned with 100 % confidence.

```python
from efficient_apriori import apriori
transactions = [('eggs', 'bacon', 'soup'),
('eggs', 'bacon', 'apple'),
('soup', 'bacon', 'banana')]
itemsets, rules = apriori(transactions, min_support=0.5, min_confidence=1)
print(rules) # [{eggs} -> {bacon}, {soup} -> {bacon}]
```
More examples are included below.

## Installation

Here's how to install from GitHub.

```bash
git clone https://github.com/tommyod/Efficient-Apriori.git
cd Efficient-Apriori
pip install .
```

## Contributing

You are very welcome to scrutinize the code and make pull requests if you have suggestions for improvements.
Your submitted code must be PEP8 compliant, and all tests must pass.

## More examples

### Filtering and sorting association rules

It's possible to filter and sort the returned list of association rules.

```python
from efficient_apriori import apriori
transactions = [('eggs', 'bacon', 'soup'),
('eggs', 'bacon', 'apple'),
('soup', 'bacon', 'banana')]
itemsets, rules = apriori(transactions, min_support=0.2, min_confidence=1)

# Print out every rule with 2 items on the left hand side,
# 1 item on the right hand side, sorted by lift
rules_rhs = filter(lambda rule: len(rule.lhs) == 2 and len(rule.rhs) == 1, rules)
for rule in sorted(rules_rhs, key=lambda rule: rule.lift):
print(rule) # Prints the rule and its confidence, support, lift, ...
```

### Working with large datasets

If you have data that is too large to fit into memory, you may pass a function returning a generator instead of a list.
The `min_support` will most likely have to be a large value, or the algorithm will take very long before it terminates.
If you have massive amounts of data, this Python implementation is likely not fast enough, and you should consult more specialized implementations.

```python
def data_generator(filename):
"""
Data generator, needs to return a generator to be called several times.
"""
def data_gen():
with open(filename) as file:
for line in file:
yield tuple(k.strip() for k in line.split(','))

return data_gen

transactions = data_generator('dataset.csv')
itemsets, rules = apriori(transactions, min_support=0.9, min_confidence=0.6)
```


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

efficient_apriori-0.4.tar.gz (11.6 kB view details)

Uploaded Source

Built Distributions

efficient_apriori-0.4.1-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

efficient_apriori-0.4-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file efficient_apriori-0.4.tar.gz.

File metadata

File hashes

Hashes for efficient_apriori-0.4.tar.gz
Algorithm Hash digest
SHA256 8dcd7f73ed1e8a9220d3a3443e476af487139b1e8c913342bef2559731fc2aa6
MD5 b51256a5deb7cddf4ccea6432e6afc79
BLAKE2b-256 7adcf826e5224484df8924e21d5933e6f33e6e74fe16ad13f39a44033791b76f

See more details on using hashes here.

File details

Details for the file efficient_apriori-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for efficient_apriori-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 31bf15995852f59a43156bfb0e344d51f8dd3a811171e7a97474d1c9244ab97e
MD5 efb03e03d52965b4d7ad30c09e90fe38
BLAKE2b-256 65e2f4f424e3ce73ff1e7bf77dd1f85d89cb0b19b3589461bbeceb6b8bdb4d28

See more details on using hashes here.

File details

Details for the file efficient_apriori-0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for efficient_apriori-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 353e09bf44e09c382446cf19dbba73ed688c2e416e4355ed5c9eba11d9f34f3e
MD5 2a2281b8c680fa0e133e4c9960a220f1
BLAKE2b-256 be6f75acab60a102de6bb3b9c0d27ef823fcaee90b88b3698288181f390b5217

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page