Microsoft Azure Event Hubs Client Library for Python
Project description
Azure Event Hubs client library for Python
Azure Event Hubs is a highly scalable publish-subscribe service that can ingest millions of events per second and stream them to multiple consumers. This lets you process and analyze the massive amounts of data produced by your connected devices and applications. Once Event Hubs has collected the data, you can retrieve, transform, and store it by using any real-time analytics provider or with batching/storage adapters. If you would like to know more about Azure Event Hubs, you may wish to review: What is Event Hubs?
The Azure Event Hubs client library allows for publishing and consuming of Azure Event Hubs events and may be used to:
- Emit telemetry about your application for business intelligence and diagnostic purposes.
- Publish facts about the state of your application which interested parties may observe and use as a trigger for taking action.
- Observe interesting operations and interactions happening within your business or other ecosystem, allowing loosely coupled systems to interact without the need to bind them together.
- Receive events from one or more publishers, transform them to better meet the needs of your ecosystem, then publish the transformed events to a new stream for consumers to observe.
Source code | Package (PyPi) | API reference documentation | Product documentation
Getting started
Install the package
Install the Azure Event Hubs client library for Python with pip:
$ pip install --pre azure-eventhub
Prerequisites
-
Python 2.7, 3.5 or later.
-
Microsoft Azure Subscription: To use Azure services, including Azure Event Hubs, you'll need a subscription. If you do not have an existing Azure account, you may sign up for a free trial or use your MSDN subscriber benefits when you create an account.
-
Event Hubs namespace with an Event Hub: To interact with Azure Event Hubs, you'll also need to have a namespace and Event Hub available. If you are not familiar with creating Azure resources, you may wish to follow the step-by-step guide for creating an Event Hub using the Azure portal. There, you can also find detailed instructions for using the Azure CLI, Azure PowerShell, or Azure Resource Manager (ARM) templates to create an Event Hub.
Authenticate the client
Interaction with Event Hubs starts with an instance of the EventHubClient class. You need the host name, SAS/AAD credential and event hub name to instantiate the client object.
Obtain a connection string
For the Event Hubs client library to interact with an Event Hub, it will need to understand how to connect and authorize with it. The easiest means for doing so is to use a connection string, which is created automatically when creating an Event Hubs namespace. If you aren't familiar with shared access policies in Azure, you may wish to follow the step-by-step guide to get an Event Hubs connection string.
Create client
There are several ways to instantiate the EventHubClient object and the following code snippets demonstrate two ways:
Create client from connection string:
from azure.eventhub import EventHubClient
connection_str = '<< CONNECTION STRING FOR THE EVENT HUBS NAMESPACE >>'
event_hub_path = '<< NAME OF THE EVENT HUB >>'
client = EventHubClient.from_connection_string(connection_str, event_hub_path)
- The
from_connection_string
method takes the connection string of the formEndpoint=sb://<yournamespace>.servicebus.windows.net/;SharedAccessKeyName=<yoursharedaccesskeyname>;SharedAccessKey=<yoursharedaccesskey>
and entity name to your Event Hub instance. You can get the connection string from the Azure portal.
Create client using the azure-identity library:
from azure.eventhub import EventHubClient
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
host = '<< HOSTNAME OF THE EVENT HUB >>'
event_hub_path = '<< NAME OF THE EVENT HUB >>'
client = EventHubClient(host, event_hub_path, credential)
- This constructor takes the host name and entity name of your Event Hub instance and credential that implements the TokenCredential interface. There are implementations of the TokenCredential interface available in the azure-identity package. The host name is of the format
<yournamespace.servicebus.windows.net>
.
Key concepts
-
An Event Hub client is the primary interface for developers interacting with the Event Hubs client library, allowing for inspection of Event Hub metadata and providing a guided experience towards specific Event Hub operations such as the creation of producers and consumers.
-
An Event Hub producer is a source of telemetry data, diagnostics information, usage logs, or other log data, as part of an embedded device solution, a mobile device application, a game title running on a console or other device, some client or server based business solution, or a web site.
-
An Event Hub consumer picks up such information from the Event Hub and processes it. Processing may involve aggregation, complex computation, and filtering. Processing may also involve distribution or storage of the information in a raw or transformed fashion. Event Hub consumers are often robust and high-scale platform infrastructure parts with built-in analytics capabilities, like Azure Stream Analytics, Apache Spark, or Apache Storm.
-
A partition is an ordered sequence of events that is held in an Event Hub. Azure Event Hubs provides message streaming through a partitioned consumer pattern in which each consumer only reads a specific subset, or partition, of the message stream. As newer events arrive, they are added to the end of this sequence. The number of partitions is specified at the time an Event Hub is created and cannot be changed.
-
A consumer group is a view of an entire Event Hub. Consumer groups enable multiple consuming applications to each have a separate view of the event stream, and to read the stream independently at their own pace and from their own position. There can be at most 5 concurrent readers on a partition per consumer group; however it is recommended that there is only one active consumer for a given partition and consumer group pairing. Each active reader receives all of the events from its partition; if there are multiple readers on the same partition, then they will receive duplicate events.
For more concepts and deeper discussion, see: Event Hubs Features. Also, the concepts for AMQP are well documented in OASIS Advanced Messaging Queuing Protocol (AMQP) Version 1.0.
Examples
The following sections provide several code snippets covering some of the most common Event Hubs tasks, including:
- Inspect an Event Hub
- Publish events to an Event Hub
- Consume events from an Event Hub
- Async publish events to an Event Hub
- Async consume events from an Event Hub
- Consume events using an Event Processor
Inspect an Event Hub
Get the partition ids of an Event Hub.
from azure.eventhub import EventHubClient
connection_str = '<< CONNECTION STRING FOR THE EVENT HUBS NAMESPACE >>'
event_hub_path = '<< NAME OF THE EVENT HUB >>'
client = EventHubClient.from_connection_string(connection_str, event_hub_path)
partition_ids = client.get_partition_ids()
Publish events to an Event Hub
Publish events to an Event Hub.
from azure.eventhub import EventHubClient, EventData
connection_str = '<< CONNECTION STRING FOR THE EVENT HUBS NAMESPACE >>'
event_hub_path = '<< NAME OF THE EVENT HUB >>'
client = EventHubClient.from_connection_string(connection_str, event_hub_path)
producer = client.create_producer(partition_id="0")
try:
event_list = []
for i in range(10):
event_list.append(EventData(b"A single event"))
with producer:
producer.send(event_list)
except:
raise
finally:
pass
Consume events from an Event Hub
Consume events from an Event Hub.
import logging
from azure.eventhub import EventHubClient, EventData, EventPosition
connection_str = '<< CONNECTION STRING FOR THE EVENT HUBS NAMESPACE >>'
event_hub_path = '<< NAME OF THE EVENT HUB >>'
client = EventHubClient.from_connection_string(connection_str, event_hub_path)
consumer = client.create_consumer(consumer_group="$default", partition_id="0", event_position=EventPosition("-1"))
try:
logger = logging.getLogger("azure.eventhub")
with consumer:
received = consumer.receive(max_batch_size=100, timeout=5)
for event_data in received:
logger.info("Message received:{}".format(event_data))
except:
raise
finally:
pass
Async publish events to an Event Hub
Publish events to an Event Hub asynchronously.
from azure.eventhub.aio import EventHubClient
from azure.eventhub import EventData
connection_str = '<< CONNECTION STRING FOR THE EVENT HUBS NAMESPACE >>'
event_hub_path = '<< NAME OF THE EVENT HUB >>'
client = EventHubClient.from_connection_string(connection_str, event_hub_path)
producer = client.create_producer(partition_id="0")
try:
event_list = []
for i in range(10):
event_list.append(EventData(b"A single event"))
async with producer:
await producer.send(event_list)
except:
raise
finally:
pass
Async consume events from an Event Hub
Consume events asynchronously from an EventHub.
import logging
from azure.eventhub.aio import EventHubClient
from azure.eventhub import EventData, EventPosition
connection_str = '<< CONNECTION STRING FOR THE EVENT HUBS NAMESPACE >>'
event_hub_path = '<< NAME OF THE EVENT HUB >>'
client = EventHubClient.from_connection_string(connection_str, event_hub_path)
consumer = client.create_consumer(consumer_group="$default", partition_id="0", event_position=EventPosition("-1"))
try:
logger = logging.getLogger("azure.eventhub")
async with consumer:
received = await consumer.receive(max_batch_size=100, timeout=5)
for event_data in received:
logger.info("Message received:{}".format(event_data))
except:
raise
finally:
pass
Consume events using an Event Processor
Using an EventHubConsumer
to consume events like in the previous examples puts the responsibility of storing the checkpoints (the last processed event) on the user. Checkpoints are important for restarting the task of processing events from the right position in a partition. Ideally, you would also want to run multiple programs targeting different partitions with some load balancing. This is where an EventProcessor
can help.
The EventProcessor
will delegate the processing of events to a PartitionProcessor
that you provide, allowing you to focus on business logic while the processor holds responsibility for managing the underlying consumer operations including checkpointing and load balancing.
While load balancing is a feature we will be adding in the next update, you can see how to use the EventProcessor
in the below example, where we use an in memory PartitionManager
that does checkpointing in memory.
import asyncio
from azure.eventhub.aio import EventHubClient
from azure.eventhub.eventprocessor import EventProcessor, PartitionProcessor, Sqlite3PartitionManager
connection_str = '<< CONNECTION STRING FOR THE EVENT HUBS NAMESPACE >>'
async def do_operation(event):
# do some sync or async operations. If the operation is i/o intensive, async will have better performance
print(event)
class MyPartitionProcessor(PartitionProcessor):
def __init__(self, checkpoint_manager):
super(MyPartitionProcessor, self).__init__(checkpoint_manager)
async def process_events(self, events):
if events:
await asyncio.gather(*[do_operation(event) for event in events])
await self._checkpoint_manager.update_checkpoint(events[-1].offset, events[-1].sequence_number)
def partition_processor_factory(checkpoint_manager):
return MyPartitionProcessor(checkpoint_manager)
async def main():
client = EventHubClient.from_connection_string(connection_str, receive_timeout=5, retry_total=3)
partition_manager = Sqlite3PartitionManager() # in-memory PartitionManager
try:
event_processor = EventProcessor(client, "$default", MyPartitionProcessor, partition_manager)
# You can also define a callable object for creating PartitionProcessor like below:
# event_processor = EventProcessor(client, "$default", partition_processor_factory, partition_manager)
asyncio.ensure_future(event_processor.start())
await asyncio.sleep(60)
await event_processor.stop()
finally:
await partition_manager.close()
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Troubleshooting
General
The Event Hubs APIs generate the following exceptions.
- AuthenticationError: Failed to authenticate because of wrong address, SAS policy/key pair, SAS token or azure identity.
- ConnectError: Failed to connect to the EventHubs. The AuthenticationError is a type of ConnectError.
- ConnectionLostError: Lose connection after a connection has been built.
- EventDataError: The EventData to be sent fails data validation. For instance, this error is raised if you try to send an EventData that is already sent.
- EventDataSendError: The Eventhubs service responds with an error when an EventData is sent.
- EventHubError: All other Eventhubs related errors. It is also the root error class of all the above mentioned errors.
Next steps
Examples
These are the samples in our repo demonstraing the usage of the library.
- ./examples/send.py - use producer to publish events
- ./examples/recv.py - use consumer to consume events
- ./examples/async_examples/send_async.py - async/await support of a producer
- ./examples/async_examples/recv_async.py - async/await support of a consumer
- ./examples/eventprocessor/event_processor_example.py - event processor
Documentation
Reference documentation is available at https://azure.github.io/azure-sdk-for-python.
Logging
- Enable
azure.eventhub
logger to collect traces from the library. - Enable
uamqp
logger to collect traces from the underlying uAMQP library. - Enable AMQP frame level trace by setting
network_tracing=True
when creating the client.
Provide Feedback
If you encounter any bugs or have suggestions, please file an issue in the Issues section of the project.
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Release History
5.0.0b2 (2019-08-06)
New features
- Added method
create_batch
on theEventHubProducer
to create anEventDataBatch
that can then be used to add events until the maximum size is reached.- This batch object can then be used in the
send()
method to send all the added events to Event Hubs. - This allows publishers to build batches without the possibility of encountering the error around the message size exceeding the supported limit when sending events.
- It also allows publishers with bandwidth concerns to control the size of each batch published.
- This batch object can then be used in the
- Added new configuration parameters for exponential delay between retry operations.
retry_total
: The total number of attempts to redo the failed operation.backoff_factor
: The delay time factor.backoff_max
: The maximum delay time in total.
- Added support for context manager on
EventHubClient
. - Added new error type
OperationTimeoutError
for send operation. - Introduced a new class
EventProcessor
which replaces the older concept of Event Processor Host. This early preview is intended to allow users to test the new design using a single instance ofEventProcessor
. The ability to checkpoints to a durable store will be added in future updates.EventProcessor
: EventProcessor creates and runs consumers for all partitions of the eventhub.PartitionManager
: PartitionManager defines the interface for getting/claiming ownerships of partitions and updating checkpoints.PartitionProcessor
: PartitionProcessor defines the interface for processing events.CheckpointManager
: CheckpointManager takes responsibility for updating checkpoints during events processing.
Breaking changes
EventProcessorHost
was replaced byEventProcessor
, please read the new features for details.- Replaced
max_retries
configuration parameter of the EventHubClient withretry_total
.
5.0.0b1 (2019-06-25)
Version 5.0.0b1 is a preview of our efforts to create a client library that is user friendly and idiomatic to the Python ecosystem. The reasons for most of the changes in this update can be found in the Azure SDK Design Guidelines for Python. For more information, please visit https://aka.ms/azure-sdk-preview1-python.
New features
- Added new configuration parameters for creating EventHubClient.
credential
: The credential object used for authentication which implementsTokenCredential
interface of getting tokens.transport_type
: The type of transport protocol that will be used for communicating with the Event Hubs service.max_retries
: The max number of attempts to redo the failed operation when an error happened.- for detailed information about the configuration parameters, please read the reference documentation.
- Added new methods
get_partition_properties
andget_partition_ids
to EventHubClient. - Added support for http proxy.
- Added support for authentication using azure-identity credential.
- Added support for transport using AMQP over WebSocket.
Breaking changes
- New error hierarchy
azure.error.EventHubError
azure.error.ConnectionLostError
azure.error.ConnectError
azure.error.AuthenticationError
azure.error.EventDataError
azure.error.EventDataSendError
- Renamed Sender/Receiver to EventHubProducer/EventHubConsumer.
- Renamed
add_sender
tocreate_producer
andadd_receiver
tocreate_consumer
in EventHubClient. - EventHubConsumer is now iterable.
- Renamed
- Rename class azure.eventhub.Offset to azure.eventhub.EventPosition.
- Rename method
get_eventhub_info
toget_properties
of EventHubClient. - Reorganized connection management, EventHubClient is no longer responsible for opening/closing EventHubProducer/EventHubConsumer.
- Each EventHubProducer/EventHubConsumer is responsible for its own connection management.
- Added support for context manager on EventHubProducer and EventHubConsumer.
- Reorganized async APIs into "azure.eventhub.aio" namespace and rename to drop the "_async" suffix.
- Updated uAMQP dependency to 1.2.
1.3.1 (2019-02-28)
BugFixes
- Fixed bug where datetime offset filter was using a local timestamp rather than UTC.
- Fixed stackoverflow error in continuous connection reconnect attempts.
1.3.0 (2019-01-29)
BugFixes
- Added support for auto reconnect on token expiration and other auth errors (issue #89).
Features
- Added ability to create ServiceBusClient from an existing SAS auth token, including providing a function to auto-renew that token on expiry.
- Added support for storing a custom EPH context value in checkpoint (PR #84, thanks @konstantinmiller)
1.2.0 (2018-11-29)
- Support for Python 2.7 in azure.eventhub module (azure.eventprocessorhost will not support Python 2.7).
- Parse EventData.enqueued_time as a UTC timestamp (issue #72, thanks @vjrantal)
1.1.1 (2018-10-03)
- Fixed bug in Azure namespace package.
1.1.0 (2018-09-21)
-
Changes to
AzureStorageCheckpointLeaseManager
parameters to support other connection options (issue #61):- The
storage_account_name
,storage_account_key
andlease_container_name
arguments are now optional keyword arguments. - Added a
sas_token
argument that must be specified withstorage_account_name
in place ofstorage_account_key
. - Added an
endpoint_suffix
argument to support storage endpoints in National Clouds. - Added a
connection_string
argument that, if specified, overrides all other endpoint arguments. - The
lease_container_name
argument now defaults to"eph-leases"
if not specified.
- The
-
Fix for clients failing to start if run called multipled times (issue #64).
-
Added convenience methods
body_as_str
andbody_as_json
to EventData object for easier processing of message data.
1.0.0 (2018-08-22)
- API stable.
- Renamed internal
_async
module toasync_ops
for docs generation. - Added optional
auth_timeout
parameter toEventHubClient
andEventHubClientAsync
to configure how long to allow for token negotiation to complete. Default is 60 seconds. - Added optional
send_timeout
parameter toEventHubClient.add_sender
andEventHubClientAsync.add_async_sender
to determine the timeout for Events to be successfully sent. Default value is 60 seconds. - Reformatted logging for performance.
0.2.0 (2018-08-06)
-
Stability improvements for EPH.
-
Updated uAMQP version.
-
Added new configuration options for Sender and Receiver;
keep_alive
andauto_reconnect
. These flags have been added to the following:EventHubClient.add_receiver
EventHubClient.add_sender
EventHubClientAsync.add_async_receiver
EventHubClientAsync.add_async_sender
EPHOptions.keey_alive_interval
EPHOptions.auto_reconnect_on_error
0.2.0rc2 (2018-07-29)
- Breaking change
EventData.offset
will now return an object of type~uamqp.common.Offset
rather than str. The original string value can be retrieved from~uamqp.common.Offset.value
. - Each sender/receiver will now run in its own independent connection.
- Updated uAMQP dependency to 0.2.0
- Fixed issue with IoTHub clients not being able to retrieve partition information.
- Added support for HTTP proxy settings to both EventHubClient and EPH.
- Added error handling policy to automatically reconnect on retryable error.
- Added keep-alive thread for maintaining an unused connection.
0.2.0rc1 (2018-07-06)
- Breaking change Restructured library to support Python 3.7. Submodule
async
has been renamed and all classes from this module can now be imported from azure.eventhub directly. - Breaking change Removed optional
callback
argument fromReceiver.receive
andAsyncReceiver.receive
. - Breaking change
EventData.properties
has been renamed toEventData.application_properties
. This removes the potential for messages to be processed via callback for not yet returned in the batch. - Updated uAMQP dependency to v0.1.0
- Added support for constructing IoTHub connections.
- Fixed memory leak in receive operations.
- Dropped Python 2.7 wheel support.
0.2.0b2 (2018-05-29)
- Added
namespace_suffix
to EventHubConfig() to support national clouds. - Added
device_id
attribute to EventData to support IoT Hub use cases. - Added message header to workaround service bug for PartitionKey support.
- Updated uAMQP dependency to vRC1.
0.2.0b1 (2018-04-20)
- Updated uAMQP to latest version.
- Further testing and minor bug fixes.
0.2.0a2 (2018-04-02)
- Updated uAQMP dependency.
0.2.0a1 (unreleased)
- Swapped out Proton dependency for uAMQP.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file azure-eventhub-5.0.0b2.zip
.
File metadata
- Download URL: azure-eventhub-5.0.0b2.zip
- Upload date:
- Size: 74.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe3a0259cfdf782de85665344679a5d1bce5795f8180a044b2647708f292c186 |
|
MD5 | a18a7f7a64631652b51dba2d752554a2 |
|
BLAKE2b-256 | 406f53b25234faa76eee4a3fc7596341c9c350bffca4632252a4e23d0b83f5d4 |
File details
Details for the file azure_eventhub-5.0.0b2-py2.py3-none-any.whl
.
File metadata
- Download URL: azure_eventhub-5.0.0b2-py2.py3-none-any.whl
- Upload date:
- Size: 57.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24ee51474f54af56d121ba7cb561b615e1e2e23d1ba4ae517c1904968aaae92f |
|
MD5 | a3be7188cc86f144521bef9db2204aad |
|
BLAKE2b-256 | 0dc29f78cdbe9e1585736e1fe94e6cc03c68234573624cbfe21075f7f85f7589 |