A Fivetran async provider for Apache Airflow
Project description
Fivetran Async Provider for Apache Airflow
This package provides an async operator, sensor and hook that integrates Fivetran into Apache Airflow.
FivetranSensor
allows you to monitor a Fivetran sync job for completion before running downstream processes.
FivetranOperator
submits a Fivetran sync job and polls for its status on the triggerer.
Since an async sensor or operator frees up worker slot while polling is happening on the triggerer,
they consume less resources when compared to traditional "sync" sensors and operators.
Fivetran automates your data pipeline, and Airflow automates your data processing.
Installation
Prerequisites: An environment running apache-airflow
.
pip install airflow-provider-fivetran-async
Configuration
In the Airflow user interface, configure a Connection for Fivetran. Most of the Connection config fields will be left blank. Configure the following fields:
Conn Id
:fivetran
Conn Type
:Fivetran
Login
: Fivetran API KeyPassword
: Fivetran API Secret
Find the Fivetran API Key and Secret in the Fivetran Account Settings, under the API Config section. See our documentation for more information on Fivetran API Authentication.
The sensor assumes the Conn Id
is set to fivetran
, however if you are managing multiple Fivetran accounts, you can set this to anything you like. See the DAG in examples to see how to specify a custom Conn Id
.
Modules
Fivetran Operator Async
from fivetran_provider_async.operators import FivetranOperator
FivetranOperator
submits a Fivetran sync job and monitors it on trigger for completion.
FivetranOperator
requires that you specify the connector_id
of the Fivetran connector you wish to trigger. You can find connector_id
in the Settings page of the connector you configured in the Fivetran dashboard.
The FivetranOperator
will wait for the sync to complete so long as wait_for_completion=True
(this is the default). It is recommended that
you run in deferrable mode (this is also the default). If wait_for_completion=False
, the operator will return the timestamp for the last sync.
Import into your DAG via:
Fivetran Sensor Async
from fivetran_provider_async.sensors import FivetranSensor
FivetranSensor
monitors a Fivetran sync job for completion.
Monitoring with FivetranSensor
allows you to trigger downstream processes only when the Fivetran sync jobs have completed, ensuring data consistency.
FivetranSensor
requires that you specify the connector_id
of the Fivetran connector you want to wait for. You can find connector_id
in the Settings page of the connector you configured in the Fivetran dashboard.
You can use multiple instances of FivetranSensor
to monitor multiple Fivetran connectors.
FivetranSensor
is most commonly useful in two scenarios:
- Fivetran is using a separate scheduler than the Airflow scheduler.
- You set
wait_for_completion=False
in theFivetranOperator
, and you need to await theFivetranOperator
task later. (You may want to do this if you want to arrange your DAG such that some tasks are dependent on starting a sync and other tasks are dependent on completing a sync).
If you are doing the 1st pattern, you may find it useful to set the completed_after_time
to data_interval_end
, or data_interval_end
with some buffer:
fivetran_sensor = FivetranSensor(
task_id="wait_for_fivetran_externally_scheduled_sync",
connector_id="bronzing_largely",
poke_interval=5,
completed_after_time="{{ data_interval_end + macros.timedelta(minutes=1) }}",
)
If you are doing the 2nd pattern, you can use XComs to pass the target completed time to the sensor:
fivetran_op = FivetranOperator(
task_id="fivetran_sync_my_db",
connector_id="bronzing_largely",
wait_for_completion=False,
)
fivetran_sensor = FivetranSensor(
task_id="wait_for_fivetran_db_sync",
connector_id="bronzing_largely",
poke_interval=5,
completed_after_time="{{ task_instance.xcom_pull('fivetran_sync_op', key='return_value') }}",
)
fivetran_op >> fivetran_sensor
You may also specify the FivetranSensor
without a completed_after_time
.
In this case, the sensor will make note of when the last completed time was, and will wait for a new completed time.
Import into your DAG via:
Examples
See the examples directory for an example DAG.
Issues
Please submit issues and pull requests in our official repo: https://github.com/astronomer/airflow-provider-fivetran-async
We are happy to hear from you. Please email any feedback to the authors at humans@astronomer.io.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for airflow-provider-fivetran-async-2.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75123bfc303e5134fffb79ec1a0d807e01041415b8c518b1d0f8d038c7c4e5d3 |
|
MD5 | f72088d0e2c69adb145169ef14a84b8d |
|
BLAKE2b-256 | b51fe19599e6723eb84b0ccf3e67837ac44830d4e32dc67e37aa8f26caa965b8 |
Hashes for airflow_provider_fivetran_async-2.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9968db72be321a7774ad029c73b625a1139721dc3df40a4afb7435134869a0b1 |
|
MD5 | ccf643610cc313d5221acaa0d9e29eb6 |
|
BLAKE2b-256 | 09d23a4908ef1a90be108243c09c31b440a9f77042dd62ff0c5e21e60206fdad |