Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

These details have been verified by PyPI

Maintainers

ashb dimberman kaxil tati_alchueyr

These details have not been verified by PyPI

Project links

Project description

astro

workflows made easy

Astro Python SDK is a Python SDK for rapid development of extract, transform, and load workflows in Apache Airflow. It allows you to express your workflows as a set of data dependencies without having to worry about ordering and tasks. The Astro Python SDK is maintained by Astronomer.

Prerequisites

Apache Airflow >= 2.1.0.

Install

The Astro Python SDK is available at PyPI. Use the standard Python installation tools.

To install a cloud-agnostic version of the SDK, run:

pip install astro-sdk-python

You can also install dependencies for using the SDK with popular cloud providers:

pip install astro-sdk-python[amazon,google,snowflake,postgres]

Quickstart

Ensure that your Airflow environment is set up correctly by running the following commands:
```
export AIRFLOW_HOME=`pwd`
export AIRFLOW__CORE__ENABLE_XCOM_PICKLING=True
airflow db init
```
Note: AIRFLOW__CORE__ENABLE_XCOM_PICKLING needs to be enabled for astro-sdk-python.

Currently, custom XCom backends are limited to data types that are json serializable. Since Dataframes are not json serializable, we need to enable XCom pickling to store dataframes.

The data format used by pickle is Python-specific. This has the advantage that there are no restrictions imposed by external standards such as JSON or XDR (which can’t represent pointer sharing); however it means that non-Python programs may not be able to reconstruct pickled Python objects.

Read more: enable_xcom_pickling and pickle:

Create a SQLite database for the example to run with:

# The sqlite_default connection has different host for MAC vs. Linux
export SQL_TABLE_NAME=`airflow connections get sqlite_default -o     yaml | grep host | awk '{print $2}'`
sqlite3 "$SQL_TABLE_NAME" "VACUUM;"

Copy the following workflow into a file named calculate_popular_movies.py and add it to the dags directory of your Airflow project:

from datetime import datetime
from airflow import DAG
from astro import sql as aql
from astro.files import File
from astro.sql.table import Table

@aql.transform()
def top_five_animations(input_table: Table):
    return """
        SELECT title, rating
        FROM {{input_table}}
        WHERE genre1=='Animation'
        ORDER BY rating desc
        LIMIT 5;
    """

with DAG(
    "calculate_popular_movies",
    schedule_interval=None,
    start_date=datetime(2000, 1, 1),
    catchup=False,
) as dag:
    imdb_src = File("https://raw.githubusercontent.com/astronomer/astro-sdk/main/tests/data/imdb_v2.csv")
    imdb_movies = Table(name="imdb_movies", conn_id="sqlite_default")
    imdb_movies = aql.load_file(imdb_src, imdb_movies)

    top_animations = Table(name="top_animation")
    top_animations = top_five_animations(input_table=imdb_movies, output_table=top_animations)

Run the example DAG:

airflow dags test calculate_popular_movies `date -Iseconds`

Check the result of your DAG by running:

sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"

You should see the following output:

$ sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
Toy Story 3 (2010)|8.3
Inside Out (2015)|8.2
How to Train Your Dragon (2010)|8.1
Zootopia (2016)|8.1
How to Train Your Dragon 2 (2014)|7.9

Supported technologies

Databases
Google BigQuery
Postgres
Snowflake
SQLite

File types
CSV
JSON
NDJSON
Parquet

File stores
Amazon S3
Filesystem
Google GCS

Available operations

The following are some key functions available in the SDK:

load_file: Load a given file into a SQL table
transform: Applies a SQL select statement to a source table and saves the result to a destination table
drop_table: Drops a SQL table
run_raw_sql: Run any SQL statement without handling its output
append: Insert rows from the source SQL table into the destination SQL table, if there are no conflicts
merge: Insert rows from the source SQL table into the destination SQL table, depending on conflicts:
- ignore: Do not add rows that already exist
- update: Replace existing rows with new ones
export_file: Export SQL table rows into a destination file
dataframe: Export given SQL table into in-memory Pandas data-frame

For a full list of available operators, see the SDK reference documentation.

Documentation

The documentation is a work in progress--we aim to follow the Diátaxis system:

Getting Started: A hands-on introduction to the Astro Python SDK
How-to guides: Simple step-by-step user guides to accomplish specific tasks
Reference guide: Commands, modules, classes and methods
Explanation: Clarification and discussion of key decisions when designing the project

Changelog

The Astro Python SDK follows semantic versioning for releases. Check the changelog for the latest changes.

Release managements

To learn more about our release philosophy and steps, see Managing Releases.

Contribution guidelines

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

Read the Contribution Guideline for a detailed overview on how to contribute.

Contributors and maintainers should abide by the Contributor Code of Conduct.

License

Apache Licence 2.0

Project details

These details have been verified by PyPI

Maintainers

ashb dimberman kaxil tati_alchueyr

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.8.1

Jun 21, 2024

1.8.0

Jan 24, 2024

1.7.0

Aug 30, 2023

1.7.0a3 pre-release

Jun 9, 2023

1.7.0a2 pre-release

May 5, 2023

1.7.0a1 pre-release

May 5, 2023

1.6.2

Aug 8, 2023

1.6.1

May 25, 2023

1.6.0

May 3, 2023

1.6.0a1 pre-release

Apr 21, 2023

1.5.4

May 25, 2023

1.5.3

Mar 8, 2023

1.5.2

Feb 23, 2023

1.5.1

Feb 21, 2023

1.5.0

Feb 8, 2023

1.5.0a2 pre-release

Jan 30, 2023

1.5.0a1 pre-release

Jan 23, 2023

1.4.1

Jan 23, 2023

1.4.0

Jan 12, 2023

1.3.3

Dec 19, 2022

1.3.3b1 pre-release

Dec 18, 2022

1.3.2

Dec 15, 2022

1.3.1

Dec 14, 2022

1.3.0

Dec 1, 2022

1.2.3

Nov 23, 2022

1.2.2

Nov 16, 2022

1.2.1

Nov 8, 2022

1.2.0

Oct 20, 2022

1.2.0b1 pre-release

Oct 18, 2022

1.1.1

Oct 4, 2022

1.1.0

Sep 19, 2022

This version

1.1.0b2 pre-release

Sep 16, 2022

1.1.0b1 pre-release

Sep 10, 2022

1.0.2

Aug 24, 2022

1.0.1

Aug 23, 2022

1.0.0

Aug 18, 2022

1.0.0b1 pre-release

Jul 27, 2022

0.11.1

Aug 17, 2022

0.11.0

Jul 5, 2022

0.10.0

Jun 21, 2022

0.9.2

Jun 13, 2022

0.9.1

May 30, 2022

0.9.0

May 26, 2022

0.9.0b1 pre-release

May 24, 2022

0.8.5

May 25, 2022

0.8.5b1 pre-release

May 25, 2022

0.8.4

May 24, 2022

0.8.4b2 pre-release

May 24, 2022

0.8.4b1 pre-release

May 19, 2022

0.8.3

May 19, 2022

0.8.3a1 pre-release

May 19, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

astro-sdk-python-1.1.0b2.tar.gz (61.7 kB view hashes)

Uploaded Sep 16, 2022 Source

Built Distribution

astro_sdk_python-1.1.0b2-py3-none-any.whl (85.1 kB view hashes)

Uploaded Sep 16, 2022 Python 3

Hashes for astro-sdk-python-1.1.0b2.tar.gz

Hashes for astro-sdk-python-1.1.0b2.tar.gz
Algorithm	Hash digest
SHA256	`0ffc9dcd2272d3610d4db6b98db3527380e72990cc8c3c6a5f612fd6672fc998`
MD5	`8eab832afae31c2f168328345b4bd76f`
BLAKE2b-256	`902d42617028a0b91ced66f27e3eae9fedb79828baacac8a1aeaa155d63e25df`

Hashes for astro_sdk_python-1.1.0b2-py3-none-any.whl

Hashes for astro_sdk_python-1.1.0b2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`599ca57182f7b3d5ac8ad1dbfb7f198d1fcd02aca0c35af745675c1df313c686`
MD5	`4af37a1fce5e86a0794732bfc6b51cd5`
BLAKE2b-256	`c03ed661a02b400f26a9e68fa9e973d2958a2193411b270769ee55e4aee4d3cd`