Skip to main content

A Kafka aggregator based on the Faust stream processing library.

Project description

################
kafka-aggregator
################

|Build| |Docker|

A Kafka aggregator based on the `Faust <https://faust.readthedocs.io/en/latest/index.html>`_ Python Stream Processing library.

kafka-aggregator development is based on the `Safir <https://safir.lsst.io>`__ application template.


Overview
========

kafka-aggregator uses `Faust's windowing feature <https://faust.readthedocs.io/en/latest/userguide/tables.html#windowing>`_ to aggregate a stream of messages from Kafka.

kafka-aggregator implements a Faust agent, a "stream processor", that adds messages from a source topic into a Faust table. The table is configured as a tumbling window with a size, representing the window duration (time interval) and an expiration time, which specifies the duration for which the data allocated to each window will be stored. Every time a window expires, a callback function is called to aggregate the messages allocated to that window. The size of the window controls the frequency of the aggregated stream.

kafka-aggregator uses `faust-avro <https://github.com/masterysystems/faust-avro>`_ to add Avro serialization and Schema Registry support to Faust. faust-avro can parse Faust models into Avro Schemas.

See `the docs <https://kafka-aggregator.lsst.io/>`_ for more information.

.. |Build| image:: https://github.com/lsst-sqre/kafka-aggregator/workflows/CI/badge.svg
:alt: GitHub Actions
:scale: 100%
:target: https://github.com/lsst-sqre/kafka-aggregator/actions

.. |Docker| image:: https://img.shields.io/docker/v/lsstsqre/kafkaaggregator?sort=date
:alt: Docker Hub repository
:scale: 100%
:target: https://hub.docker.com/repository/docker/lsstsqre/kafkaaggregator

##########
Change log
##########

0.2.0 (2020-08-14)
==================

* Add first and third quartiles (``q1`` and ``q3``) to the list of summary statistics computed by the aggregator.
* Ability to configure the list of summary statistics to be computed.
* Pinned top-level requeriments.
* Add Kafka Connect to the docker-compose setup.
* Use only one Schema Registry by default to simplify local execution.
* First release to PyPI.


0.1.0 (2020-07-13)
==================

Initial release of kafka-aggregator with the following features:

* Use Faust windowing feature to aggregate a stream of messages.
* Use Faust-avro to add Avro serialization and Schema Registry support to Faust.
* Support to an internal Schema Registry to store schemas for the aggreated topics (optional).
* Create aggregation topic schemas from the source topic schemas and from the list of summary statistics to be computed.
* Ability to create Faust records dynamically from aggregation topic schemas.
* Ability to auto-generate code for the Faust agents (stream processors).
* Compute summary statistics for numeric fields: ``min()``, ``mean()``, ``median()``, ``stdev()``, ``max()``.
* Add example module to initialize a number of source topics in kafka, control the number of fields in each topic, and produce messages for those topics at a given frequency.
* Use Kafdrop to inspect messages from source and aggregated topics.
* Add kafka-aggregator documentation site.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kafka-aggregator-0.2.0.tar.gz (78.7 kB view details)

Uploaded Source

Built Distribution

kafka_aggregator-0.2.0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file kafka-aggregator-0.2.0.tar.gz.

File metadata

  • Download URL: kafka-aggregator-0.2.0.tar.gz
  • Upload date:
  • Size: 78.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for kafka-aggregator-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0189b674d6468360652c2efbff322ee18987cdfadb21f6d41ce843609b95dbb4
MD5 00df48ba8c929fd012bfc291f25cbd67
BLAKE2b-256 03aed4ea6da57518a3760900f397191613e66b23166044a3d5017a96172452cd

See more details on using hashes here.

File details

Details for the file kafka_aggregator-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: kafka_aggregator-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for kafka_aggregator-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 24e0ab5853788d22d9654be3ca4e4e0793b11f5b37421bde9a77a2b53d9c8e81
MD5 6efb27fd81ab53fe74ca9d15eddc1e85
BLAKE2b-256 8f4be670e57e52d20e5edab7453fe4fa9cf2fc9e767165f1dd686173f2257dca

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page