Skip to main content

Pythonic in-memory MapReduce.

Project description

Experimental Pythonic MapReduce inspired by Spotify’s luigi framework.

https://travis-ci.org/geowurster/tinymr.svg?branch=master https://coveralls.io/repos/geowurster/tinymr/badge.svg?branch=master

Canonical Word Count Example

Currently there are two MapReduce implementations, one that includes sorting and one that does not. The example below would not benefit from sorting so we can take advantage of the inherent optimization of not sorting. The API is the same but tinymr.memory.MRSerial() sorts after partitioning and again between the reducer() and final_reducer().

import json
import re
import sys

from tinymr.memory import MRSerial


class WordCount(MRSerial):

    def __init__(self):
        self.pattern = re.compile('[\W_]+')

    def mapper(self, item):
        for word in item.split():
            word = self.pattern.sub('', word)
            if word:
                yield word.lower(), 1

    def reducer(self, key, values):
        yield key, sum(values)

    def final_reducer(self, pairs):
        return {k: tuple(v)[0] for k, v in pairs}


wc = WordCount()
with open('LICENSE.txt') as f:
    out = wc(f)
    print(json.dumps(out, indent=4, sort_keys=True))

Truncated output:

{
    "a": 1,
    "above": 2,
    "advised": 1,
    "all": 1,
    "and": 8,
    "andor": 1
}

Developing

$ git clone https://github.com/geowurster/tinymr.git
$ cd tinymr
$ pip install -e .\[dev\]
$ py.test tests --cov tinymr --cov-report term-missing

License

See LICENSE.txt

Changelog

See CHANGES.md

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinymr-0.1.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

tinymr-0.1-py2.py3-none-any.whl (15.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file tinymr-0.1.tar.gz.

File metadata

  • Download URL: tinymr-0.1.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for tinymr-0.1.tar.gz
Algorithm Hash digest
SHA256 bba34762221bcf14696679b11f81810bd5a09258c1a9eec4a1c63cfbc11fe980
MD5 4e554c14e854da308529f9e201150fd7
BLAKE2b-256 d83cadbb7418a860a1202b7bb8f1dfc296ff43b622512a330ca8919e5edd1b87

See more details on using hashes here.

File details

Details for the file tinymr-0.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for tinymr-0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 093f2359d19a279a1ba88fae77b3af25650245f30ec7c027a99e733e95d1bff9
MD5 6c47547eecc5f6b3093da2da127b15a4
BLAKE2b-256 0b3a2b8c208985c7e8ee3be96a880b1757076c35bb67d98ae2a9a52948a1e709

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page