T-Digest data structure
Project description
# tdigest
### Efficient percentile estimation of streaming or distributed data
[![Latest Version](https://pypip.in/v/tdigest/badge.png)](https://pypi-hypernode.com/pypi/tdigest/)
This is a Python implementation of Ted Dunning's [t-digest](https://github.com/tdunning/t-digest) data structure. The t-digest data structure is designed around computing accurate estimates from either streaming data, or distributed data. These estimates are percentiles, quantiles, trimmed means, etc. Two t-digests can be added, making the data structure ideal for map-reduce settings, and can be serialized into much less than 10kB (instead of storing the entire list of data).
See a blog post about it here: [Percentile and Quantile Estimation of Big Data: The t-Digest](http://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest)
### Installation
```
pip install tdigest
```
### Usage
```
from tdigest import TDigest
from numpy.random import random
T1 = TDigest()
for _ in range(5000):
T1.update(random())
print T1.percentile(0.15) # about 0.15
T2 = TDigest()
T2.batch_update(random(5000))
print T2.percentile(0.15)
T = T1 + T2
T.percentile(0.3) # about 0.3
```
### Efficient percentile estimation of streaming or distributed data
[![Latest Version](https://pypip.in/v/tdigest/badge.png)](https://pypi-hypernode.com/pypi/tdigest/)
This is a Python implementation of Ted Dunning's [t-digest](https://github.com/tdunning/t-digest) data structure. The t-digest data structure is designed around computing accurate estimates from either streaming data, or distributed data. These estimates are percentiles, quantiles, trimmed means, etc. Two t-digests can be added, making the data structure ideal for map-reduce settings, and can be serialized into much less than 10kB (instead of storing the entire list of data).
See a blog post about it here: [Percentile and Quantile Estimation of Big Data: The t-Digest](http://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest)
### Installation
```
pip install tdigest
```
### Usage
```
from tdigest import TDigest
from numpy.random import random
T1 = TDigest()
for _ in range(5000):
T1.update(random())
print T1.percentile(0.15) # about 0.15
T2 = TDigest()
T2.batch_update(random(5000))
print T2.percentile(0.15)
T = T1 + T2
T.percentile(0.3) # about 0.3
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tdigest-0.1.1.tar.gz
(4.2 kB
view details)
File details
Details for the file tdigest-0.1.1.tar.gz
.
File metadata
- Download URL: tdigest-0.1.1.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 87cf1636ea31b7d015a2aeeffc1d24b31ea92a7c7db9840559f13ec0c8a789b1 |
|
MD5 | 5ee487f0c02db4be8636aef6444fddad |
|
BLAKE2b-256 | 6dff99649b1cad439f80dbb6bfe0cfc9fa16df06d839efba26f58447d13c9404 |