T-Digest data structure
Project description
# tdigest
### Efficient percentile estimation of streaming or distributed data
[![Latest Version](https://pypip.in/v/tdigest/badge.png)](https://pypi-hypernode.com/pypi/tdigest/)
This is a Python implementation of Ted Dunning's [t-digest](https://github.com/tdunning/t-digest) data structure. The t-digest data structure is designed around computing accurate estimates from either streaming data, or distributed data. These estimates are percentiles, quantiles, trimmed means, etc. Two t-digests can be added, making the data structure ideal for map-reduce settings, and can be serialized into much less than 10kB (instead of storing the entire list of data).
See a blog post about it here: [Percentile and Quantile Estimation of Big Data: The t-Digest](http://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest)
### Installation
```
pip install tdigest
```
### Usage
```
from tdigest import TDigest
from numpy.random import random
T1 = TDigest()
for _ in range(5000):
T1.update(random())
print T1.percentile(0.15) # about 0.15
T2 = TDigest()
T2.batch_update(random(5000))
print T2.percentile(0.15)
T = T1 + T2
T.percentile(0.3) # about 0.3
```
### Efficient percentile estimation of streaming or distributed data
[![Latest Version](https://pypip.in/v/tdigest/badge.png)](https://pypi-hypernode.com/pypi/tdigest/)
This is a Python implementation of Ted Dunning's [t-digest](https://github.com/tdunning/t-digest) data structure. The t-digest data structure is designed around computing accurate estimates from either streaming data, or distributed data. These estimates are percentiles, quantiles, trimmed means, etc. Two t-digests can be added, making the data structure ideal for map-reduce settings, and can be serialized into much less than 10kB (instead of storing the entire list of data).
See a blog post about it here: [Percentile and Quantile Estimation of Big Data: The t-Digest](http://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest)
### Installation
```
pip install tdigest
```
### Usage
```
from tdigest import TDigest
from numpy.random import random
T1 = TDigest()
for _ in range(5000):
T1.update(random())
print T1.percentile(0.15) # about 0.15
T2 = TDigest()
T2.batch_update(random(5000))
print T2.percentile(0.15)
T = T1 + T2
T.percentile(0.3) # about 0.3
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tdigest-0.1.2.tar.gz
(4.2 kB
view details)
File details
Details for the file tdigest-0.1.2.tar.gz
.
File metadata
- Download URL: tdigest-0.1.2.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3a8e46c8756dadb53b80aaa07a301b8c080cbc055c71745022bf08c85037df6 |
|
MD5 | 212b57e54aec8b4fe4098fdead518abc |
|
BLAKE2b-256 | 921e9fa8a5d4a5c8cf3daff3292148731c80a49c5ead3a717de05697d58e015b |