T-Digest data structure
Project description
# tdigest
### Efficient percentile estimation of streaming or distributed data
This is a Python implementation of Ted Dunning's [t-digest](https://github.com/tdunning/t-digest) data structure. The t-digest data structure is designed around computing accurate estimates from either streaming data, or distributed data. These estimates are percentiles, quantiles, trimmed means, etc. Two t-digests can be added, making the data structure ideal for map-reduce settings, and can be serialized into much less than 10kB (instead of storing the entire list of data).
See a blog post about it here: [Percentile and Quantile Estimation of Big Data: The t-Digest](http://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest)
### Usage
```
from tdigest import TDigest
from numpy.random import random
T1 = TDigest()
for _ in range(5000):
T1.update(random())
print T1.percentile(0.15) # about 0.15
T2 = TDigest()
T2.batch_update(random(5000))
print T2.percentile(0.15)
T = T1 + T2
T.percentile(0.3) # about 0.3
```
### Efficient percentile estimation of streaming or distributed data
This is a Python implementation of Ted Dunning's [t-digest](https://github.com/tdunning/t-digest) data structure. The t-digest data structure is designed around computing accurate estimates from either streaming data, or distributed data. These estimates are percentiles, quantiles, trimmed means, etc. Two t-digests can be added, making the data structure ideal for map-reduce settings, and can be serialized into much less than 10kB (instead of storing the entire list of data).
See a blog post about it here: [Percentile and Quantile Estimation of Big Data: The t-Digest](http://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest)
### Usage
```
from tdigest import TDigest
from numpy.random import random
T1 = TDigest()
for _ in range(5000):
T1.update(random())
print T1.percentile(0.15) # about 0.15
T2 = TDigest()
T2.batch_update(random(5000))
print T2.percentile(0.15)
T = T1 + T2
T.percentile(0.3) # about 0.3
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tdigest-0.1.0.tar.gz
(3.2 kB
view details)
File details
Details for the file tdigest-0.1.0.tar.gz
.
File metadata
- Download URL: tdigest-0.1.0.tar.gz
- Upload date:
- Size: 3.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4167cadb797b1287104b07072447f67b33cc0de4740b002959952c1f6171cd7c |
|
MD5 | 9bf920190be94c471be8ca9fa9863295 |
|
BLAKE2b-256 | 53661e8164b3fe37c46b445aa65dd546789cc51daf986146c4efc69bc0021153 |