Skip to main content

Distributed quantile sketches

Project description

sketches-py

This repo contains python implementations of the distributed quantile sketch algorithms GKArray [1] and DDSketch [2]. Both sketches are fully mergeable, meaning that multiple sketches from distributed systems can be combined in a central node.

Installation

To install this package, clone the repo and run python setup.py install. This package depends on numpy and protobuf. (The protobuf dependency can be removed if it's not applicable.)

GKArray

GKArray provides a sketch with a rank error guarantee of espilon (without merge) or 2*epsilon (with merge). The default value of epsilon is 0.01. For more details, refer to [2].

Usage

from gkarray.gkarray import GKArray

sketch = GKArray()

Add some values to the sketch.

import numpy as np
values = np.random.normal(size=500)
for v in values:
  sketch.add(v)

Find the quantiles of values to within epsilon of rank.

quantiles = [sketch.quantile(q) for q in [0.5, 0.75, 0.9, 1]]

Merge another GKArray into sketch.

another_sketch = GKArray()
other_values = np.random.normal(size=500)
for v in other_values:
  another_sketch.add(v)
sketch.merge(another_sketch)

Now the quantiles of values concatenated with other_values will be accurate to within 2*epsilon of rank.

DDSketch

DDSketch has a relative error guarantee for any quantile q in [0, 1] that is not too small. Concretely, the q-quantile will be accurate up to the specified relative error as long as it belongs to one of the m bins kept by the sketch. The default values for the relative accuracy and m are 0.01 and 2048, repectively. In addition, a value that is smaller than min_value in magnitude is indistinguishable from 0. The default min_value is 1.0e-9.

Usage

from ddsketch.ddsketch import DDSketch

sketch = DDSketch()

Add values to the sketch

import numpy as np

values = np.random.normal(size=500)
for v in values:
  sketch.add(v)

Find the quantiles of values to within the relative error.

quantiles = [sketch.quantile(q) for q in [0.5, 0.75, 0.9, 1]]

Merge another DDSketch into sketch.

another_sketch = DDSketch()
other_values = np.random.normal(size=500)
for v in other_values:
  another_sketch.add(v)
sketch.merge(another_sketch)

The quantiles of values concatenated with other_values are still accurate to within the relative error.

References

[1] Michael B. Greenwald and Sanjeev Khanna. Space-efficient online computation of quantile summaries. In Proc. 2001 ACM SIGMOD International Conference on Management of Data, SIGMOD ’01, pages 58–66. ACM, 2001.

[2] Charles Masson and Jee E Rim and Homin K. Lee. DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12): 2195-2205, 2019.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddsketch-1.0.2.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

ddsketch-1.0.2-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file ddsketch-1.0.2.tar.gz.

File metadata

  • Download URL: ddsketch-1.0.2.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.6

File hashes

Hashes for ddsketch-1.0.2.tar.gz
Algorithm Hash digest
SHA256 05fe785302400f03790d54f194eaa1d21bdb0f51320d2b1b5af7b5d8f7303a67
MD5 b142451f848b09ca9b99693fa8f242d7
BLAKE2b-256 57b9409a4be30c1aea04e1af94c4887dfad7783a9ef9b3eadf93c240a4abf857

See more details on using hashes here.

Provenance

File details

Details for the file ddsketch-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: ddsketch-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.6

File hashes

Hashes for ddsketch-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b1520a20b7df464a3a6c707ebf5a0f4626d879c54e5ef65210fb4f557895d4ec
MD5 9ec609edaa3b521dc4dd7e939de50a30
BLAKE2b-256 53eb4e270e776cf456faf8b58b018fb9894f353052fe2ef8931076708043a527

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page