Skip to main content

T-Digest data structure

Project description

# tdigest
### Efficient percentile estimation of streaming or distributed data
[![Latest Version](https://pypip.in/v/tdigest/badge.png)](https://pypi-hypernode.com/pypi/tdigest/)


This is a Python implementation of Ted Dunning's [t-digest](https://github.com/tdunning/t-digest) data structure. The t-digest data structure is designed around computing accurate estimates from either streaming data, or distributed data. These estimates are percentiles, quantiles, trimmed means, etc. Two t-digests can be added, making the data structure ideal for map-reduce settings, and can be serialized into much less than 10kB (instead of storing the entire list of data).

See a blog post about it here: [Percentile and Quantile Estimation of Big Data: The t-Digest](http://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest)


### Installation
```
pip install tdigest
```

### Usage

```
from tdigest import TDigest
from numpy.random import random

T1 = TDigest()
for _ in range(5000):
T1.update(random())

print T1.percentile(0.15) # about 0.15


T2 = TDigest()
T2.batch_update(random(5000))
print T2.percentile(0.15)

T = T1 + T2
T.percentile(0.3) # about 0.3
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tdigest-0.1.2.tar.gz (4.2 kB view details)

Uploaded Source

File details

Details for the file tdigest-0.1.2.tar.gz.

File metadata

  • Download URL: tdigest-0.1.2.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for tdigest-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b3a8e46c8756dadb53b80aaa07a301b8c080cbc055c71745022bf08c85037df6
MD5 212b57e54aec8b4fe4098fdead518abc
BLAKE2b-256 921e9fa8a5d4a5c8cf3daff3292148731c80a49c5ead3a717de05697d58e015b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page