Skip to main content

Wrapper for Great Expectations to fit the requirements of the Gemeente Amsterdam.

Project description

Introduction

This repository contains functions that will ease the use of Great Expectations. Users can input data and data quality rules and get results in return.

DISCLAIMER: The package is in MVP phase

Getting started

Install the dq suite on your compute, for example by running the following code in your workspace:

pip install dq-suite-amsterdam
import dq_suite

Load your data in dataframes, give them a table_name, and create a list of all dataframes:

df = spark.read.csv(csv_path+file_name, header=True, inferSchema=True) #example using csv
df.table_name = "showcase_table"
dfs = [df]
  • Define 'dfs' as a list of dataframes that require a dq check
  • Define 'dq_rules' as a JSON as shown in dq_rules_example.json in this repo
  • Define a name for your dq check, in this case "showcase"
results, brontabel_df, bronattribute_df, dqRegel_df = dq_suite.df_check(dfs, dq_rules, "showcase")

Known exceptions

The functions can run on Databricks using a Personal Compute Cluster or using a Job Cluster. Using a Shared Compute Cluster will results in an error, as it does not have the permissions that Great Expectations requires.

Updates

Version 0.1: Run a DQ check for a dataframe

Version 0.2: Run a DQ check for multiple dataframes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dq_suite_amsterdam-0.2.2.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

dq_suite_amsterdam-0.2.2-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file dq_suite_amsterdam-0.2.2.tar.gz.

File metadata

  • Download URL: dq_suite_amsterdam-0.2.2.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for dq_suite_amsterdam-0.2.2.tar.gz
Algorithm Hash digest
SHA256 a99af28dc613c9c093ed17a13312d4e94144bea3148e25d2ddaecf52cf0f2fbc
MD5 8f61be679b3b28ccd5de331c1e7f16e3
BLAKE2b-256 2b1c8f9dbc5f14fc0b30924a0bebf7fa88aa196c4839b4cb6a7dd4183e6c68c1

See more details on using hashes here.

File details

Details for the file dq_suite_amsterdam-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for dq_suite_amsterdam-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 71d9ef4f28205ce25aec7c0dd4db051905e4b23c18942775d2b9a1fc85dbf754
MD5 bdf3dbf1cf695a740bde2e4e3e435fee
BLAKE2b-256 553a1f85080fa67a2fc925e6e88d32478936eee364478baa6aec21aa013bbaeb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page