Skip to main content

ETL tool based on SqlAlchemy for building robust ETL pipelies with high emphasis on high data quality

Project description

PyPI version PyPI Build Status codecov Requirements Status Documentation Get on Slack

SqlTask

SqlTask is an extensible ETL library based on SqlAlchemy to help build robust ETL pipelines with high emphasis on data quality.

Main features of SqlTask:

  • Create well documented data models that support iterative development of both schema and data transformation logic.
  • Tightly coupled data quality checking with transformation logic with automatic creation of visualization-friendly and actionable data quality tables.
  • Make use of SQL where practical, especially expensive data filtering and aggregation during data extraction.
  • Row-by-row data transformation using Python where SQL falls short, e.g. calling third party libraries or storing state from previous rows.
  • Encourage use of modern version control tools and processes, especially GIT.
  • Performant data uploading/insertion where supported.
  • Easy integration with modern ETL orchestration tools, especially Apache Airflow.

Word of caution: SqlTask is currently under heavy development, and the API is expected to change frequently.

Supported databases

SqlTask supports all databases with a SqlAlchemy dialect, with dedicated support for the following engines:

  • Google BigQuery
  • MS SQL Server (experimental)
  • Postgres
  • Sqlite
  • Snowflake

Engines not listed above will fall back to using regular inserts.

Installation instructions

To install SqlTask without any dependencies, simply run

pip install sqltask

To automatically pull in dependencies needed by Snowflake, type

pip install sqltask[snowflake]

Please refer to the documentation on Read The Docs for further information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqltask-0.5.2.tar.gz (19.2 kB view details)

Uploaded Source

Built Distribution

sqltask-0.5.2-py3-none-any.whl (38.3 kB view details)

Uploaded Python 3

File details

Details for the file sqltask-0.5.2.tar.gz.

File metadata

  • Download URL: sqltask-0.5.2.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for sqltask-0.5.2.tar.gz
Algorithm Hash digest
SHA256 f4000b8a1054f9fdf9b6d779c9d18ecee270cab48be65bdc8303d7e9aa86e596
MD5 6162ba4f7f5e23b9d58f02a5e60b8941
BLAKE2b-256 babefba1248fbde0ea3d1fdc7a5701cfb5af940b432e33c34eb2a178ff416238

See more details on using hashes here.

File details

Details for the file sqltask-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: sqltask-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 38.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for sqltask-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6debe179bc539d9ab227f0ab28479efc91da9b2e2be109140840d326d13fda02
MD5 af66243ff840584f61f255f2c516f968
BLAKE2b-256 11bd240d9a5bc1656cfe8ba32100d201c84ae07d10bac4ddddd3d740023889c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page