Skip to main content

Some tools for working with data

Project description

datools

Documentation status PyPi link Build status Apache 2.0 License

Introduction

datools is a collection of Python-based tools for working with data in relational databases. While it contains several utilities for smoothing the rough edges of SQL, its most baked component is datools.diff, an algorithm that's best explained in a blog post and Jupyter Notebook.

To learn more, read the docs or reach out.

Database support

While datools generates SQL for its operations, different databases have their nuances. datools may run on your database today, but in an attempt to give you some certainty as to databases we know it has successfully run on, we run all tests in the test suite against the following databases:

Database Evaluated by test suite
SQLite Since v0.1.2
DuckDB Since v0.1.4
PostgreSQL Since v0.1.5
Redshift, Snowflake You provide an instance, I'll make the tests pass

History

0.1.5 (2022-04-13)

  • Support for PostgreSQL! The test suite now runs against PostgreSQL, and datools.explanations.diff now allows you to ask "why" about data stored in Postgres. Get excited!
  • datools.sqlalchemy_utils.grouping_sets_query will now generate a GROUPING SETs query for databases that support grouping sets (e.g., Postgres, DuckDB) or the equivalent UNION ALL version for databases without grouping sets support (e.g., SQLite). For more, check out the example in the docs.

0.1.4 (2022-02-27)

  • Python 3.10 support.
  • Updated test suite to run tests against multiple databases, in particular expanding from SQLite only to DuckDB and SQLite.
  • As a result of the last bullet, ensured code runs against DuckDB in addition to SQLite.
  • First stab at documentation (https://datools.readthedocs.io/en/latest/).

0.1.3 (2021-12-31)

  • Introduced mypy to linting and CI to ensure code that makes it to main has proper types.
  • Created first working example of DIFF working on a real-world dataset as a Jupyter notebook. This example partially replicates the Scorpion paper when only moteid/sensorids are considered.
  • Separated the on_columns argument of diff into on_column_values (columns for which you want to generate equality predicates as explanations) and and on_column_ranges (columns for which you want to generate range predicates as explanations after bucketing the ranges into 15 equi-sized buckets).

0.1.2 (2021-11-07)

  • First release of DIFF algorithm implementation.

0.1.0 (2021-05-09)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datools-0.1.5.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

datools-0.1.5-py2.py3-none-any.whl (13.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file datools-0.1.5.tar.gz.

File metadata

  • Download URL: datools-0.1.5.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.8.2 requests/2.27.1 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.8.10

File hashes

Hashes for datools-0.1.5.tar.gz
Algorithm Hash digest
SHA256 4659cb258cb59443b0ac123120c5e9a7fcc271010ad4f3cce066e464ac2b93bd
MD5 2e5663fd3c5d107e9510603e129a3404
BLAKE2b-256 190d047532faa41899b02b622d64139412760d935f74ed5dc8081cd02d2cf2ea

See more details on using hashes here.

File details

Details for the file datools-0.1.5-py2.py3-none-any.whl.

File metadata

  • Download URL: datools-0.1.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.8.2 requests/2.27.1 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.8.10

File hashes

Hashes for datools-0.1.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6c915ddd216225b2b0b5d1c5fcbc70ea02c84b25cfa7b82bb822b8125e5bd68e
MD5 1ee38bfc17b629228c535bb0d46855c9
BLAKE2b-256 e1789c0010d2202905536c7f38c1b515e473c34c2998e1d2fb66291e1f2f7fd5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page