Some tools for working with data
Project description
datools
Introduction
datools
is a collection of Python-based tools for working with data in relational databases. While it contains several utilities for smoothing the rough edges of SQL, its most baked component is datools.diff
, an algorithm that's best explained in a blog post and Jupyter Notebook.
To learn more, read the docs or reach out.
Database support
While datools
generates SQL for its operations, different databases
have their nuances. datools
may run on your database today, but in
an attempt to give you some certainty as to databases we know it has
successfully run on, we run all tests in the test suite against the
following databases:
Database | Evaluated by test suite |
---|---|
SQLite | Since v0.1.2 |
DuckDB | Since v0.1.4 |
PostgreSQL | Since v0.1.5 |
Redshift, Snowflake | You provide an instance, I'll make the tests pass |
History
0.1.5 (2022-04-13)
- Support for PostgreSQL! The test suite now runs against PostgreSQL, and
datools.explanations.diff
now allows you to ask "why" about data stored in Postgres. Get excited! datools.sqlalchemy_utils.grouping_sets_query
will now generate a GROUPING SETs query for databases that support grouping sets (e.g., Postgres, DuckDB) or the equivalent UNION ALL version for databases without grouping sets support (e.g., SQLite). For more, check out the example in the docs.
0.1.4 (2022-02-27)
- Python 3.10 support.
- Updated test suite to run tests against multiple databases, in particular expanding from SQLite only to DuckDB and SQLite.
- As a result of the last bullet, ensured code runs against DuckDB in addition to SQLite.
- First stab at documentation (https://datools.readthedocs.io/en/latest/).
0.1.3 (2021-12-31)
- Introduced mypy to linting and CI to ensure code that makes it to
main
has proper types. - Created first working example of DIFF working on a real-world dataset as a Jupyter notebook. This example partially replicates the Scorpion paper when only moteid/sensorids are considered.
- Separated the
on_columns
argument ofdiff
intoon_column_values
(columns for which you want to generate equality predicates as explanations) and andon_column_ranges
(columns for which you want to generate range predicates as explanations after bucketing the ranges into 15 equi-sized buckets).
0.1.2 (2021-11-07)
- First release of DIFF algorithm implementation.
0.1.0 (2021-05-09)
- First release on PyPI.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datools-0.1.5-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c915ddd216225b2b0b5d1c5fcbc70ea02c84b25cfa7b82bb822b8125e5bd68e |
|
MD5 | 1ee38bfc17b629228c535bb0d46855c9 |
|
BLAKE2b-256 | e1789c0010d2202905536c7f38c1b515e473c34c2998e1d2fb66291e1f2f7fd5 |