Skip to main content

Remedian: robust averaging of large data sets

Project description

[![Build Status](https://travis-ci.org/sappelhoff/remedian.svg?branch=master)](https://travis-ci.org/sappelhoff/remedian) [![codecov](https://codecov.io/gh/sappelhoff/remedian/branch/master/graph/badge.svg)](https://codecov.io/gh/sappelhoff/remedian) [![Documentation Status](https://readthedocs.org/projects/remedian/badge/?version=latest)](http://remedian.readthedocs.io/en/latest/?badge=latest) [![PyPI version](https://badge.fury.io/py/remedian.svg)](https://badge.fury.io/py/remedian)

# remedian
The Remedian: A Robust Averaging Method for Large Data Sets - Python implementation

This algorithm is used to approximate the median of several data chunks if
these data chunks cannot (or should not) be loaded into memory at once.

Given a data chunk of size `obs_size`, and `t` data chunks overall, the
Remedian class sets up a number `k_arrs` of arrays of length `n_obs`.

The median of the `t` data chunks of size `obs_size` is then approximated
as follows: One data chunk after another is fed into the `n_obs` positions
of the first array. When the first array is full, its median is calculated
and stored in the first position of the second array. After this, the first
array is re-used to fill the second position of the second array, etc.
When the second array is full, the median of its values is stored in the
first position of the third array, and so on.

The final "Remedian" is the median of the last array, after all `t` data
chunks have been fed into the object.

References
----------
1. P.J. Rousseeuw, G.W. Bassett Jr., "The remedian:
A robust averaging method for large data sets", Journal
of the American Statistical Association, vol. 85 (1990),
pp. 97-104

2. M. Chao, G. Lin, "The asymptotic distributions of
the remedians", Journal of Statistical Planning and
Inference, vol. 37 (1993), pp. 1-11

3. Domenico Cantone, Micha Hofri, "Further analysis of
the remedian algorithm", Theoretical Computer Science,
vol. 495 (2013), pp. 1-16

# Installation

`pip install remedian`

# Installation of most recent version

1. activate your python environment
2. `git clone https://www.github.com/sappelhoff/remedian`
3. `cd remedian`
5. `pip install -e .`
6. then you should be able to `from remedian.remedian import Remedian`


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

remedian-0.1.1.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

remedian-0.1.1-py2.py3-none-any.whl (8.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file remedian-0.1.1.tar.gz.

File metadata

  • Download URL: remedian-0.1.1.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for remedian-0.1.1.tar.gz
Algorithm Hash digest
SHA256 92817b182838ebecaebc43229bd2adc452ba4c9e0116b171ae454df07d6f39e6
MD5 ad61a2d7361ddead0e2b66c9756ecedb
BLAKE2b-256 febed9686fc562b230244a388b079922c1051241103c9d3852d4b57d40486630

See more details on using hashes here.

File details

Details for the file remedian-0.1.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for remedian-0.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 1bfb3c445f91d156fc609be5ec682623122a2f1e41a70702e1efddc6462cff02
MD5 d96a9af5fd3e3b75bf5c66084d5c7cc4
BLAKE2b-256 8d2c121cb54cfedea339561b86951f229c1351a6846e701f3693241b2b7b58c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page