Skip to main content

ETL tool based on SqlAlchemy for building robust ETL pipelies with high emphasis on high data quality

Project description

PyPI version PyPI PyPI license

Sqltask

Sqltask is an extensible ETL library based on SqlAlchemy with the intent of enabling building robust ETL pipelines with high emphasis on data quality.

Main features of Sqltask:

  • Create well documented data models that support iterative development of both schema and data transformation logic.
  • Combine data quality checking with transformation logic with automatic creation of visualization-friendly data quality tables.
  • Make use of SQL where practical, especially expensive and complex data filtering and aggregation during data extraction.
  • Row-by-row data transformation using Python where SQL isn't feasible, e.g. calling third party libraries or storing state from previous rows.
  • Encourage use of modern version control tools and processed, especially GIT.
  • Performant data loading using bulk-loading where supported.
  • Easy integration with modern ETL orchestration tools, especially Apache Airflow.

Supported databases

Sqltask supports all databases with a Sqlalchemy dialect, with performant bulk-loading for the following engines:

  • Google BigQuery (experimental)
  • MS SQL Server (experimental)
  • Postgres
  • Sqlite
  • Snowflake

Engines not listed above will fall back to using regular inserts.

Installation instructions

To install Sqltask without any dependencies, simply run

pip install sqltask

To automatically install all supported third party modules type

pip install sqltask[bigquery,mssql,snowflake,postgres]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

sqltask-0.2.2-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file sqltask-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: sqltask-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for sqltask-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 11bc06a9da078ef8ba7e5fd4733956bef3a5dee5cacad1c8f938b38bc1fc20dd
MD5 7cbeb97df51a5cc3a6d4e0b8bedc7340
BLAKE2b-256 87487fe6fe061180056737d250269fc4210a1cc8194ed95b43be8d5fe41661a3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page