Distributed quantile sketches
Project description
sketches-py
This repo contains python implementations of the distributed quantile sketch algorithms GKArray [1] and DDSketch [2]. Both sketches are fully mergeable, meaning that multiple sketches from distributed systems can be combined in a central node.
Installation
To install this package, clone the repo and run python setup.py install
. This package depends on numpy
and protobuf
. (The protobuf dependency can be removed if it's not applicable.)
GKArray
GKArray provides a sketch with a rank error guarantee of espilon (without merge) or 2*epsilon (with merge). The default value of epsilon is 0.01. For more details, refer to [2].
Usage
from gkarray.gkarray import GKArray
sketch = GKArray()
Add some values to the sketch.
import numpy as np
values = np.random.normal(size=500)
for v in values:
sketch.add(v)
Find the quantiles of values
to within epsilon of rank.
quantiles = [sketch.quantile(q) for q in [0.5, 0.75, 0.9, 1]]
Merge another GKArray
into sketch
.
another_sketch = GKArray()
other_values = np.random.normal(size=500)
for v in other_values:
another_sketch.add(v)
sketch.merge(another_sketch)
Now the quantiles of values
concatenated with other_values
will be accurate to within 2*epsilon of rank.
DDSketch
DDSketch has a relative error guarantee for any quantile q in [0, 1] that is not too small. Concretely, the q-quantile will be accurate up to the specified relative error as long as it belongs to one of the m bins kept by the sketch. The default values for the relative accuracy and m are 0.01 and 2048, repectively. In addition, a value that is smaller than min_value in magnitude is indistinguishable from 0. The default min_value is 1.0e-9.
Usage
from ddsketch.ddsketch import DDSketch
sketch = DDSketch()
Add values to the sketch
import numpy as np
values = np.random.normal(size=500)
for v in values:
sketch.add(v)
Find the quantiles of values
to within the relative error.
quantiles = [sketch.quantile(q) for q in [0.5, 0.75, 0.9, 1]]
Merge another DDSketch
into sketch
.
another_sketch = DDSketch()
other_values = np.random.normal(size=500)
for v in other_values:
another_sketch.add(v)
sketch.merge(another_sketch)
The quantiles of values
concatenated with other_values
are still accurate to within the relative error.
References
[1] Michael B. Greenwald and Sanjeev Khanna. Space-efficient online computation of quantile summaries. In Proc. 2001 ACM SIGMOD International Conference on Management of Data, SIGMOD ’01, pages 58–66. ACM, 2001.
[2] Charles Masson and Jee E Rim and Homin K. Lee. DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12): 2195-2205, 2019.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ddsketch-1.0.2.tar.gz
.
File metadata
- Download URL: ddsketch-1.0.2.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05fe785302400f03790d54f194eaa1d21bdb0f51320d2b1b5af7b5d8f7303a67 |
|
MD5 | b142451f848b09ca9b99693fa8f242d7 |
|
BLAKE2b-256 | 57b9409a4be30c1aea04e1af94c4887dfad7783a9ef9b3eadf93c240a4abf857 |
Provenance
File details
Details for the file ddsketch-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: ddsketch-1.0.2-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1520a20b7df464a3a6c707ebf5a0f4626d879c54e5ef65210fb4f557895d4ec |
|
MD5 | 9ec609edaa3b521dc4dd7e939de50a30 |
|
BLAKE2b-256 | 53eb4e270e776cf456faf8b58b018fb9894f353052fe2ef8931076708043a527 |