Remedian: robust averaging of large data sets
Project description
remedian
The Remedian: A Robust Averaging Method for Large Data Sets - Python implementation
This algorithm is used to approximate the median of several data chunks if these data chunks cannot (or should not) be loaded into memory at once.
Given a data chunk of size obs_size
, and t
data chunks overall, the
Remedian class sets up a number k_arrs
of arrays of length n_obs
.
The median of the t
data chunks of size obs_size
is then approximated
as follows: One data chunk after another is fed into the n_obs
positions
of the first array. When the first array is full, its median is calculated
and stored in the first position of the second array. After this, the first
array is re-used to fill the second position of the second array, etc.
When the second array is full, the median of its values is stored in the
first position of the third array, and so on.
The final "Remedian" is the median of the last array, after all t
data
chunks have been fed into the object.
Installation
pip install remedian
The dependencies should be installed automatically by pip.
Installation of most recent version
- activate your python environment
git clone https://www.github.com/sappelhoff/remedian
cd remedian
pip install -e .
- then you should be able to
from remedian.remedian import Remedian
Usage
See the example in the docs.
References
P.J. Rousseeuw, G.W. Bassett Jr., "The remedian: A robust averaging method for large data sets", Journal of the American Statistical Association, vol. 85 (1990), pp. 97-104
M. Chao, G. Lin, "The asymptotic distributions of the remedians", Journal of Statistical Planning and Inference, vol. 37 (1993), pp. 1-11
Domenico Cantone, Micha Hofri, "Further analysis of the remedian algorithm", Theoretical Computer Science, vol. 495 (2013), pp. 1-16
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file remedian-0.1.2.tar.gz
.
File metadata
- Download URL: remedian-0.1.2.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | da143daf593f2b0cc8a92520af2a8627cd761ed23e467b5bfe32a1db22d61da7 |
|
MD5 | 6c7d3c039a98a1288094365bee869edf |
|
BLAKE2b-256 | 72fcc21c34e837e85d166e4f42e09fb582956649b66b1ac024e1c28b4418aeaf |
File details
Details for the file remedian-0.1.2-py2.py3-none-any.whl
.
File metadata
- Download URL: remedian-0.1.2-py2.py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 081ef1cfaebaf5c52144c3f2942fe933c569a4ae130fce97035a2e3c8f480e0c |
|
MD5 | f176ef6f83ed3a45a0a3cac1f7256d0a |
|
BLAKE2b-256 | 7f140c30fecae429b73dea8fb4e78c9c188872396f085c23d79ff159ecb9869e |