Skip to main content

Remedian: robust averaging of large data sets

Project description

Build Status codecov Documentation Status PyPI version

remedian

The Remedian: A Robust Averaging Method for Large Data Sets - Python implementation

This algorithm is used to approximate the median of several data chunks if these data chunks cannot (or should not) be loaded into memory at once.

Given a data chunk of size obs_size, and t data chunks overall, the Remedian class sets up a number k_arrs of arrays of length n_obs.

The median of the t data chunks of size obs_size is then approximated as follows: One data chunk after another is fed into the n_obs positions of the first array. When the first array is full, its median is calculated and stored in the first position of the second array. After this, the first array is re-used to fill the second position of the second array, etc. When the second array is full, the median of its values is stored in the first position of the third array, and so on.

The final "Remedian" is the median of the last array, after all t data chunks have been fed into the object.

Installation

pip install remedian

The dependencies should be installed automatically by pip.

Installation of most recent version

  1. activate your python environment
  2. git clone https://www.github.com/sappelhoff/remedian
  3. cd remedian
  4. pip install -e .
  5. then you should be able to from remedian.remedian import Remedian

Usage

See the example in the docs.

References

P.J. Rousseeuw, G.W. Bassett Jr., "The remedian: A robust averaging method for large data sets", Journal of the American Statistical Association, vol. 85 (1990), pp. 97-104

M. Chao, G. Lin, "The asymptotic distributions of the remedians", Journal of Statistical Planning and Inference, vol. 37 (1993), pp. 1-11

Domenico Cantone, Micha Hofri, "Further analysis of the remedian algorithm", Theoretical Computer Science, vol. 495 (2013), pp. 1-16

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

remedian-0.1.2.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

remedian-0.1.2-py2.py3-none-any.whl (6.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file remedian-0.1.2.tar.gz.

File metadata

  • Download URL: remedian-0.1.2.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for remedian-0.1.2.tar.gz
Algorithm Hash digest
SHA256 da143daf593f2b0cc8a92520af2a8627cd761ed23e467b5bfe32a1db22d61da7
MD5 6c7d3c039a98a1288094365bee869edf
BLAKE2b-256 72fcc21c34e837e85d166e4f42e09fb582956649b66b1ac024e1c28b4418aeaf

See more details on using hashes here.

File details

Details for the file remedian-0.1.2-py2.py3-none-any.whl.

File metadata

  • Download URL: remedian-0.1.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 6.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for remedian-0.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 081ef1cfaebaf5c52144c3f2942fe933c569a4ae130fce97035a2e3c8f480e0c
MD5 f176ef6f83ed3a45a0a3cac1f7256d0a
BLAKE2b-256 7f140c30fecae429b73dea8fb4e78c9c188872396f085c23d79ff159ecb9869e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page