A Kafka aggregator based on the Faust stream processing library.
Project description
################
kafka-aggregator
################
|Build| |Docker|
A Kafka aggregator based on the `Faust <https://faust.readthedocs.io/en/latest/index.html>`_ Python Stream Processing library.
kafka-aggregator development is based on the `Safir <https://safir.lsst.io>`__ application template.
Overview
========
kafka-aggregator uses `Faust's windowing feature <https://faust.readthedocs.io/en/latest/userguide/tables.html#windowing>`_ to aggregate a stream of messages from Kafka.
kafka-aggregator implements a Faust agent, a "stream processor", that adds messages from a source topic into a Faust table. The table is configured as a tumbling window with a size, representing the window duration (time interval) and an expiration time, which specifies the duration for which the data allocated to each window will be stored. Every time a window expires, a callback function is called to aggregate the messages allocated to that window. The size of the window controls the frequency of the aggregated stream.
kafka-aggregator uses `faust-avro <https://github.com/masterysystems/faust-avro>`_ to add Avro serialization and Schema Registry support to Faust. faust-avro can parse Faust models into Avro Schemas.
See `the docs <https://kafka-aggregator.lsst.io/>`_ for more information.
.. |Build| image:: https://github.com/lsst-sqre/kafka-aggregator/workflows/CI/badge.svg
:alt: GitHub Actions
:scale: 100%
:target: https://github.com/lsst-sqre/kafka-aggregator/actions
.. |Docker| image:: https://img.shields.io/docker/v/lsstsqre/kafkaaggregator?sort=date
:alt: Docker Hub repository
:scale: 100%
:target: https://hub.docker.com/repository/docker/lsstsqre/kafkaaggregator
##########
Change log
##########
0.2.0 (2020-08-14)
==================
* Add first and third quartiles (``q1`` and ``q3``) to the list of summary statistics computed by the aggregator.
* Ability to configure the list of summary statistics to be computed.
* Pinned top-level requeriments.
* Add Kafka Connect to the docker-compose setup.
* Use only one Schema Registry by default to simplify local execution.
* First release to PyPI.
0.1.0 (2020-07-13)
==================
Initial release of kafka-aggregator with the following features:
* Use Faust windowing feature to aggregate a stream of messages.
* Use Faust-avro to add Avro serialization and Schema Registry support to Faust.
* Support to an internal Schema Registry to store schemas for the aggreated topics (optional).
* Create aggregation topic schemas from the source topic schemas and from the list of summary statistics to be computed.
* Ability to create Faust records dynamically from aggregation topic schemas.
* Ability to auto-generate code for the Faust agents (stream processors).
* Compute summary statistics for numeric fields: ``min()``, ``mean()``, ``median()``, ``stdev()``, ``max()``.
* Add example module to initialize a number of source topics in kafka, control the number of fields in each topic, and produce messages for those topics at a given frequency.
* Use Kafdrop to inspect messages from source and aggregated topics.
* Add kafka-aggregator documentation site.
kafka-aggregator
################
|Build| |Docker|
A Kafka aggregator based on the `Faust <https://faust.readthedocs.io/en/latest/index.html>`_ Python Stream Processing library.
kafka-aggregator development is based on the `Safir <https://safir.lsst.io>`__ application template.
Overview
========
kafka-aggregator uses `Faust's windowing feature <https://faust.readthedocs.io/en/latest/userguide/tables.html#windowing>`_ to aggregate a stream of messages from Kafka.
kafka-aggregator implements a Faust agent, a "stream processor", that adds messages from a source topic into a Faust table. The table is configured as a tumbling window with a size, representing the window duration (time interval) and an expiration time, which specifies the duration for which the data allocated to each window will be stored. Every time a window expires, a callback function is called to aggregate the messages allocated to that window. The size of the window controls the frequency of the aggregated stream.
kafka-aggregator uses `faust-avro <https://github.com/masterysystems/faust-avro>`_ to add Avro serialization and Schema Registry support to Faust. faust-avro can parse Faust models into Avro Schemas.
See `the docs <https://kafka-aggregator.lsst.io/>`_ for more information.
.. |Build| image:: https://github.com/lsst-sqre/kafka-aggregator/workflows/CI/badge.svg
:alt: GitHub Actions
:scale: 100%
:target: https://github.com/lsst-sqre/kafka-aggregator/actions
.. |Docker| image:: https://img.shields.io/docker/v/lsstsqre/kafkaaggregator?sort=date
:alt: Docker Hub repository
:scale: 100%
:target: https://hub.docker.com/repository/docker/lsstsqre/kafkaaggregator
##########
Change log
##########
0.2.0 (2020-08-14)
==================
* Add first and third quartiles (``q1`` and ``q3``) to the list of summary statistics computed by the aggregator.
* Ability to configure the list of summary statistics to be computed.
* Pinned top-level requeriments.
* Add Kafka Connect to the docker-compose setup.
* Use only one Schema Registry by default to simplify local execution.
* First release to PyPI.
0.1.0 (2020-07-13)
==================
Initial release of kafka-aggregator with the following features:
* Use Faust windowing feature to aggregate a stream of messages.
* Use Faust-avro to add Avro serialization and Schema Registry support to Faust.
* Support to an internal Schema Registry to store schemas for the aggreated topics (optional).
* Create aggregation topic schemas from the source topic schemas and from the list of summary statistics to be computed.
* Ability to create Faust records dynamically from aggregation topic schemas.
* Ability to auto-generate code for the Faust agents (stream processors).
* Compute summary statistics for numeric fields: ``min()``, ``mean()``, ``median()``, ``stdev()``, ``max()``.
* Add example module to initialize a number of source topics in kafka, control the number of fields in each topic, and produce messages for those topics at a given frequency.
* Use Kafdrop to inspect messages from source and aggregated topics.
* Add kafka-aggregator documentation site.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
kafka-aggregator-0.2.0.tar.gz
(78.7 kB
view details)
Built Distribution
File details
Details for the file kafka-aggregator-0.2.0.tar.gz
.
File metadata
- Download URL: kafka-aggregator-0.2.0.tar.gz
- Upload date:
- Size: 78.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0189b674d6468360652c2efbff322ee18987cdfadb21f6d41ce843609b95dbb4 |
|
MD5 | 00df48ba8c929fd012bfc291f25cbd67 |
|
BLAKE2b-256 | 03aed4ea6da57518a3760900f397191613e66b23166044a3d5017a96172452cd |
File details
Details for the file kafka_aggregator-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: kafka_aggregator-0.2.0-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24e0ab5853788d22d9654be3ca4e4e0793b11f5b37421bde9a77a2b53d9c8e81 |
|
MD5 | 6efb27fd81ab53fe74ca9d15eddc1e85 |
|
BLAKE2b-256 | 8f4be670e57e52d20e5edab7453fe4fa9cf2fc9e767165f1dd686173f2257dca |