Skip to main content

Kedro-Airflow makes it easy to deploy Kedro projects to Airflow

Project description

Kedro-Airflow

License Python Version PyPI Version Code Style: Black

Apache Airflow is a tool for orchestrating complex workflows and data processing pipelines. The Kedro-Airflow plugin can be used for:

  • Rapid pipeline creation in the prototyping phase. You can write Python functions in Kedro without worrying about schedulers, daemons, services or having to recreate the Airflow DAG file.
  • Automatic dependency resolution in Kedro. This allows you to bypass Airflow's need to specify the order of your tasks.
  • Distributing Kedro tasks across many workers. You can also enable monitoring and scheduling of the tasks' runtimes.

Installation

kedro-airflow is a Python plugin. To install it:

pip install kedro-airflow

Usage

You can use kedro-airflow to deploy a Kedro pipeline as an Airflow DAG by following these steps:

Step 1: Generate the DAG file

At the root directory of the Kedro project, run:

kedro airflow create

This command will generate an Airflow DAG file located in the airflow_dags/ directory in your project. You can pass a --pipeline flag to generate the DAG file for a specific Kedro pipeline and an --env flag to generate the DAG file for a specific Kedro environment. Passing --all will convert all registered Kedro pipelines to Airflow DAGs.

Step 2: Copy the DAG file to the Airflow DAGs folder.

For more information about the DAGs folder, please visit Airflow documentation. The Airflow DAG configuration can be customized by editing this file.

Step 3: Package and install the Kedro pipeline in the Airflow executor's environment

After generating and deploying the DAG file, you will then need to package and install the Kedro pipeline into the Airflow executor's environment. Please visit the guide to deploy Kedro as a Python package for more details.

FAQ

What if my DAG file is in a different directory to my project folder?

By default, the generated DAG file is configured to live in the same directory as your project as per this template. If your DAG file is located in a different directory to your project, you will need to tweak this manually after running the kedro airflow create command.

What if I want to use a different Jinja2 template?

You can use the additional command line argument --jinja-file (alias -j) to provide an alternative path to a Jinja2 template. Note that these files have to accept the same variables as those used in the default Jinja2 template.

kedro airflow create --jinja-file=./custom/template.j2

How can I pass arguments to the Airflow DAGs dynamically?

kedro-airflow picks up configuration from airflow.yml in conf/base or conf/local of your Kedro project. Or it could be in a folder starting with airflow. The parameters are read by Kedro. Arguments can be specified globally, or per pipeline:

# Global parameters
default:
    start_date: [2023, 1, 1]
    max_active_runs: 3
    # https://airflow.apache.org/docs/stable/scheduler.html#dag-runs
    schedule_interval: "@once"
    catchup: false
    # Default settings applied to all tasks
    owner: "airflow"
    depends_on_past: false
    email_on_failure: false
    email_on_retry: false
    retries: 1
    retry_delay: 5

# Arguments specific to the pipeline (overrides the parameters above)
data_science:
    owner: "airflow-ds"

Arguments can also be passed via --params in the command line:

kedro airflow create --params "schedule_interval='@weekly'"

These variables are passed to the Jinja2 template that creates an Airflow DAG from your pipeline.

What if I want to use a configuration pattern other than airflow* and airflow**?

In order to configure the config loader, update the settings.py file in your Kedro project. For instance, if you would like to use the name scheduler, then change the file as follows:

CONFIG_LOADER_ARGS = {"config_patterns": {"airflow": ["scheduler*", "scheduler/**"]}}

Follow Kedro's official documentation, to see how to add templating, custom resolvers etc.

What if I want to pass different arguments?

In order to pass arguments other than those specified in the default template, simply pass a custom template (see: "What if I want to use a different Jinja2 template?")

The syntax for arguments is:

{{ argument_name }}

In order to make arguments optional, one can use:

{{ argument_name | default("default_value") }}

For examples, please have a look at the default template (airflow_dag_template.j2).

What if I want to use a configuration file other than airflow.yml?

The default configuration pattern is ["airflow*", "airflow/**"]. In order to configure the OmegaConfigLoader, update the settings.py file in your Kedro project as follows:

from kedro.config import OmegaConfigLoader

CONFIG_LOADER_CLASS = OmegaConfigLoader
CONFIG_LOADER_ARGS = {
    # other args
    "config_patterns": {  # configure the pattern for configuration files
        "airflow": ["airflow*", "airflow/**"]
    }
}

Follow Kedro's official documentation, to see how to add templating, custom resolvers etc. (https://docs.kedro.org/en/stable/configuration/advanced_configuration.html#how-to-do-templating-with-the-omegaconfigloader)[https://docs.kedro.org/en/stable/configuration/advanced_configuration.html#how-to-do-templating-with-the-omegaconfigloader]

How can I use Airflow runtime parameters?

It is possible to pass parameters when triggering an Airflow DAG from the user interface. In order to use this feature, create a custom template using the Params syntax. See "What if I want to use a different Jinja2 template?" for instructions on using custom templates.

What if I want to use a different Airflow Operator?

Which Airflow Operator to use depends on the environment your project is running in. You can set the operator to use by providing a custom template. See "What if I want to use a different Jinja2 template?" for instructions on using custom templates. The rich offering of operators means that the kedro-airflow plugin is providing templates for specific operators. The default template provided by kedro-airflow uses the BaseOperator.

Can I contribute?

Yes! Want to help build Kedro-Airflow? Check out our guide to contributing.

What licence do you use?

Kedro-Airflow is licensed under the Apache 2.0 License.

Python version support policy

  • The Kedro-Airflow supports all Python versions that are actively maintained by the CPython core team. When a Python version reaches end of life, support for that version is dropped from kedro-airflow. This is not considered a breaking change.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kedro-airflow-0.8.0.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

kedro_airflow-0.8.0-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file kedro-airflow-0.8.0.tar.gz.

File metadata

  • Download URL: kedro-airflow-0.8.0.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for kedro-airflow-0.8.0.tar.gz
Algorithm Hash digest
SHA256 1bb5d90d0fe70b5d862564a439076e442799c8f9db905951c01ded77d1b2a0c0
MD5 4b4dbfadc89fecb124f26408213e4996
BLAKE2b-256 d7bea78c640d3fe73c7186db29875f32bd220bb7216166ac8254f66a1522c716

See more details on using hashes here.

File details

Details for the file kedro_airflow-0.8.0-py3-none-any.whl.

File metadata

File hashes

Hashes for kedro_airflow-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 626b78d0634aeef8d6b7dc56b9975ff6b96dccf77f7bdd8fbe4124b47dfd37d4
MD5 3459426e3d54ab82f65305a476505a50
BLAKE2b-256 f15554db13f58c1072d56bbe3fd6e99421cdac4fdf502614903984135e241ef1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page