Skip to main content

A decorator that allows users to run SQL queries natively in Airflow.

Project description

astro

workflows made easy

Python versions License Development Status PyPI downloads Contributors Commit activity CI codecov

astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python. It helps DAG authors to achieve more with less code. It is powered by Apache Airflow and maintained by Astronomer.

:warning: Disclaimer This project's development status is alpha. In other words, it is not production-ready yet. The interfaces may change. We welcome alpha users and brave souls to test it - any feedback is welcome.

Install

Astro is available at PyPI. Use the standard Python installation tools.

To install a cloud-agnostic version of Astro, run:

pip install astro-projects

If using cloud providers, install using the optional dependencies of interest:

pip install astro-projects[amazon,google,snowflake,postgres]

Quick-start

After installing Astro, copy the following example dag calculate_popular_movies.py to a local directory named dags:

from datetime import datetime
from airflow import DAG
from astro import sql as aql
from astro.sql.table import Table


@aql.transform()
def top_five_animations(input_table: Table):
    return """
        SELECT Title, Rating
        FROM {{input_table}}
        WHERE Genre1=='Animation'
        ORDER BY Rating desc
        LIMIT 5;
    """


with DAG(
    "calculate_popular_movies",
    schedule_interval=None,
    start_date=datetime(2000, 1, 1),
    catchup=False,
) as dag:
    imdb_movies = aql.load_file(
        path="https://raw.githubusercontent.com/astro-projects/astro/main/tests/data/imdb.csv",
        task_id="load_csv",
        output_table=Table(
            table_name="imdb_movies", database="sqlite", conn_id="sqlite_default"
        ),
    )

    top_five_animations(
        input_table=imdb_movies,
        output_table=Table(
            table_name="top_animation", database="sqlite", conn_id="sqlite_default"
        ),
    )

Set up a local instance of Airflow by running:

export AIRFLOW_HOME=`pwd`
export AIRFLOW__CORE__ENABLE_XCOM_PICKLING=True

airflow db init

Create an SQLite database for the example to run with and run the DAG:

# The sqlite_default connection has different host for MAC vs. Linux
export SQL_TABLE_NAME=`airflow connections get sqlite_default -o yaml | grep host | awk '{print $2}'`

sqlite3 "$SQL_TABLE_NAME" "VACUUM;"
airflow dags test calculate_popular_movies `date -Iseconds`

Check the top five animations calculated by your first Astro DAG by running:

sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"

You should see the following output:

$ sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
Toy Story 3 (2010)|8.3
Inside Out (2015)|8.2
How to Train Your Dragon (2010)|8.1
Zootopia (2016)|8.1
How to Train Your Dragon 2 (2014)|7.9

Requirements

Because astro relies on the Task Flow API and it depends on Apache Airflow >= 2.1.0.

Supported technologies

Databases File types File locations
Google BigQuery CSV Amazon S3
Postgres JSON Filesystem
Snowflake NDJSON Google GCS
SQLite Parquet

Available operations

A summary of the currently available operations in astro. More details are available in the reference guide.

  • load_file: load a given file into a SQL table
  • transform: applies a SQL select statement to a source table and saves the result to a destination table
  • truncate: remove all records from a SQL table
  • run_raw_sql: run any SQL statement without handling its output
  • append: insert rows from the source SQL table into the destination SQL table, if there are no conflicts
  • merge: insert rows from the source SQL table into the destination SQL table, depending on conflicts:
    • ignore: do not add rows that already exist
    • update: replace existing rows with new ones
  • save_file: export SQL table rows into a destination file
  • dataframe: export given SQL table into in-memory Pandas data-frame
  • render: given a directory containing SQL statements, dynamically create transform tasks within a DAG

Documentation

The documentation is a work in progress, and we aim to follow the Diátaxis system:

  • Tutorial: a hands-on introduction to astro
  • How-to guides: simple step-by-step user guides to accomplish specific tasks
  • Reference guide: commands, modules, classes and methods
  • Explanation: Clarification and discussion of key decisions when designing the project.

Changelog

We follow Semantic Versioning for releases. Check the changelog for the latest changes.

Release Managements

To learn more about our release philosophy and steps, check here

Contribution Guidelines

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

Read the Contribution Guideline for a detailed overview on how to contribute.

As contributors and maintainers to this project, you should abide by the Contributor Code of Conduct.

License

Apache Licence 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

astro-projects-0.8.3.tar.gz (42.0 kB view details)

Uploaded Source

Built Distribution

astro_projects-0.8.3-py3-none-any.whl (54.0 kB view details)

Uploaded Python 3

File details

Details for the file astro-projects-0.8.3.tar.gz.

File metadata

  • Download URL: astro-projects-0.8.3.tar.gz
  • Upload date:
  • Size: 42.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.12

File hashes

Hashes for astro-projects-0.8.3.tar.gz
Algorithm Hash digest
SHA256 8ddda696f407ea5eae8717ede7b804c72ead037289bc8f174e0da9fc7796ea2b
MD5 65e17600b477096a42c8dfd84ad0404d
BLAKE2b-256 384ba2b4bb4a9d16340a6adce46c141af6eb428bd11f8ef9608549b565d6eb86

See more details on using hashes here.

File details

Details for the file astro_projects-0.8.3-py3-none-any.whl.

File metadata

File hashes

Hashes for astro_projects-0.8.3-py3-none-any.whl
Algorithm Hash digest
SHA256 023875719cebd2337d65b43f83321ff189b8a7587248a2b1cb763966e930e4c5
MD5 3a69b550e66d872b3a06731b5b8635c2
BLAKE2b-256 0081e9f4282c0e2017da032ef5fc8846597fcdac88b7d0ca313d4e030f0dc03f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page