Skip to main content

Wrapper for Great Expectations to fit the requirements of the Gemeente Amsterdam.

Project description

Introduction

This repository contains functions that will ease the use of Great Expectations. Users can input data and data quality rules and get results in return.

DISCLAIMER: The package is in MVP phase

Getting started

Install the dq suite on your compute, for example by running the following code in your workspace:

pip install dq-suite-amsterdam
import dq_suite

Load your data in dataframes, give them a table_name, and create a list of all dataframes:

df = spark.read.csv(csv_path+file_name, header=True, inferSchema=True) #example using csv
df.table_name = "showcase_table"
dfs = [df]
  • Define 'dfs' as a list of dataframes that require a dq check
  • Define 'dq_rules' as a JSON as shown in dq_rules_example.json in this repo
  • Define a name for your dq check, in this case "showcase"
results, brontabel_df, bronattribute_df, dqRegel_df = dq_suite.df_check(dfs, dq_rules, "showcase")

Known exceptions

The functions can run on Databricks using a Personal Compute Cluster or using a Job Cluster. Using a Shared Compute Cluster will results in an error, as it does not have the permissions that Great Expectations requires.

Updates

Version 0.1: Run a DQ check for a dataframe

Version 0.2: Run a DQ check for multiple dataframes

Version 0.3: Refactored I/O

Version 0.4: Added schema validation with Amsterdam Schema per table

Version 0.5: Export schema from Unity Catalog

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dq_suite_amsterdam-0.5.0.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

dq_suite_amsterdam-0.5.0-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file dq_suite_amsterdam-0.5.0.tar.gz.

File metadata

  • Download URL: dq_suite_amsterdam-0.5.0.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for dq_suite_amsterdam-0.5.0.tar.gz
Algorithm Hash digest
SHA256 90d5f30bb0865d5c750d2689a88a9ed3920b874f48d5314be09dede271a8b136
MD5 bc4fbc9fa3ac5d57a6adf0e401aff87c
BLAKE2b-256 2219789d07dfd80a5dbb6e0010e1d1c7108c0f474e20bf791ce0a8bb90ff894b

See more details on using hashes here.

File details

Details for the file dq_suite_amsterdam-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dq_suite_amsterdam-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60d35d82a1735a5995d39292ded69df6b9d0a011c3695ccaef575d7beb65b0d6
MD5 fda1ebbfc64125b4aa7780b49d27c206
BLAKE2b-256 610a45426136fe0a386e24d6e2cefe7e82f50da715732fee5d6e6787de7fa64f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page