Skip to main content

Wrapper for Great Expectations to fit the requirements of the Gemeente Amsterdam.

Project description

Introduction

This repository contains functions that will ease the use of Great Expectations. Users can input data and data quality rules and get results in return.

DISCLAIMER: The package is in MVP phase

Getting started

Install the dq suite on your compute, for example by running the following code in your workspace:

pip install dq-suite-amsterdam
import dq_suite

Load your data in dataframes, give them a table_name, and create a list of all dataframes:

df = spark.read.csv(csv_path+file_name, header=True, inferSchema=True) #example using csv
df.table_name = "showcase_table"
dfs = [df]
  • Define 'dfs' as a list of dataframes that require a dq check
  • Define 'dq_rules' as a JSON as shown in dq_rules_example.json in this repo
  • Define a name for your dq check, in this case "showcase"
results, brontabel_df, bronattribute_df, dqRegel_df = dq_suite.df_check(dfs, dq_rules, "showcase")

Known exceptions

The functions can run on Databricks using a Personal Compute Cluster or using a Job Cluster. Using a Shared Compute Cluster will results in an error, as it does not have the permissions that Great Expectations requires.

Updates

Version 0.1: Run a DQ check for a dataframe

Version 0.2: Run a DQ check for multiple dataframes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dq_suite_amsterdam-0.4.0.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

dq_suite_amsterdam-0.4.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file dq_suite_amsterdam-0.4.0.tar.gz.

File metadata

  • Download URL: dq_suite_amsterdam-0.4.0.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for dq_suite_amsterdam-0.4.0.tar.gz
Algorithm Hash digest
SHA256 beaa982115f3c63b6e8af5c175adbe6dc9ba5d0b0b1ed9c34f40f2ef48554ff7
MD5 b2fdb96788fe86b352e7e91b150e861c
BLAKE2b-256 ddc31efcdd8d45ee7e7fc02bdd193ad84032ad0e3b52a44a6def853dc1996922

See more details on using hashes here.

File details

Details for the file dq_suite_amsterdam-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dq_suite_amsterdam-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9af2a5f60918a5ddfe61d6e94bf25b04d7eaaf099db347a8433ddcf622bcfd48
MD5 8d0f77dc265e7f22e4356a3dc3410211
BLAKE2b-256 195d6028647029ce2ab74d11d82795295c4076598295a4f802e3242d24ffaf34

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page