great-expectations

Always know what to expect from your data.

These details have not been verified by PyPI

Project links

Homepage

Project description

Great Expectations

Always know what to expect from your data.

Quick Start

Getting Started will teach you how to get up and running in minutes.

For full documentation, visit Great Expectations on readthedocs.io.

Down with Pipeline Debt! explains the core philosophy behind Great Expectations. Please give it a read, and clap, follow, and share while you’re at it.

What is great_expectations?

Great Expectations helps teams save time and promote analytic integrity by offering a unique approach to automated testing: pipeline tests. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality.

Software developers have long known that automated testing is essential for managing complex codebases. Great Expectations brings the same discipline, confidence, and acceleration to data science and engineering teams.

Why would I use Great Expectations?

To get more done with data, faster. Teams use great_expectations to

Save time during data cleaning and munging.
Accelerate ETL and data normalization.
Streamline analyst-to-engineer handoffs.
Streamline knowledge capture and requirements gathering from subject-matter experts.
Monitor data quality in production data pipelines and data products.
Automate verification of new data deliveries from vendors and other teams.
Simplify debugging data pipelines if (when) they break.
Codify assumptions used to build models when sharing with other teams or analysts.
Develop rich, shared data documention in the course of normal work.
Make implicit knowledge explicit.
etc., etc., etc.

Key features

Expectations

Expectations are the workhorse abstraction in Great Expectations. Like assertions in traditional python unit tests, Expectations provide a flexible, declarative language for describing expected behavior. Unlike traditional unit tests, Great Expectations applies Expectations to data instead of code.

Expectations include: - expect_table_row_count_to_equal - expect_column_values_to_be_unique - expect_column_values_to_be_in_set - expect_column_mean_to_be_between - …and many more

Great Expectations currently supports native execution of Expectations in three environments: pandas, SQL (through the SQLAlchemy core), and Spark. This approach follows the philosophy of “take the compute to the data.” Future releases of Great Expectations will extend this functionality to other frameworks, such as dask and BigQuery.

Automated data profiling

Writing pipeline tests from scratch can be tedious and counterintuitive. Great Expectations jump starts the process by providing powerful tools for automated data profiling. This provides the double benefit of helping you explore data faster, and capturing knowledge for future documentation and testing.

DataContexts and DataSources

…allow you to configure connections your data stores, using names that point to concepts you’re already familiar with: “the ml_training_results bucket in S3,” “the Users table in Redshift.” Great Expectations provides convenience libraries to introspect most common data stores (Ex: SQL databases, data directories and S3 buckets.) We are also working to integrate with pipeline execution frameworks (Ex: airflow, dbt, dagster, prefect.io). The Great Expectations framework lets you fetch, validate, profile, and document your data in a way that’s meaningful within your existing infrastructure and work environment.

Tooling for validation

Evaluating Expectations against data is just one step in a typical validation workflow. Great Expectations makes the followup steps simple, too: storing validation results to a shared bucket, summarizing results and posting notifications to slack, handling differences between warnings and errors, etc.

Great Expectations also provides robust concepts of Batches and Runs. Although we sometimes talk informally about validating “dataframes” or “tables,” it’s much more common to validate batches of new data—subsets of tables, rather than whole tables. DataContexts provide simple, universal syntax to generate, fetch, and validate Batches of data from any of your DataSources.

Compile to Docs

As of v0.7.0, Great Expectations includes new classes and methods to render Expectations to clean, human-readable documentation. Since docs are compiled from tests and you are running tests against new data as it arrives, your documentation is guaranteed to never go stale.

What does Great Expectations NOT do?

Great Expectations is NOT a pipeline execution framework.

We aim to integrate seamlessly with DAG execution tools like Spark, Airflow, dbt, prefect, dagster, Kedro, etc. We DON’T execute your pipelines for you.

Great Expectations is NOT a data versioning tool.

Great Expectations does not store data itself. Instead, it deals in metadata about data: Expectations, validation results, etc. If you want to bring your data itself under version control, check out tools like: DVC and Quilt.

Great Expectations currently works best in a python/bash environment.

Great Expectations is python-based. You can invoke it from the command line without using a python programming environment, but if you’re working in another ecosystem, other tools might be a better choice. If you’re running in a pure R environment, you might consider assertR as an alternative. Within the Tensorflow ecosystem, TFDV fulfills a similar function as Great Expectations.

How do I learn more?

For full documentation, visit Great Expectations on readthedocs.io.

Down with Pipeline Debt! explains the core philosophy behind Great Expectations. Please give it a read, and clap, follow, and share while you’re at it.

For quick, hands-on introductions to Great Expectations’ key features, check out our walkthrough videos:

Who maintains Great Expectations?

Great Expectations is under active development by James Campbell, Abe Gong, Eugene Mandel, Rob Lim, Taylor Miller, and help from many others.

What’s the best way to get in touch with the Great Expectations team?

If you have questions, comments, or just want to have a good old-fashioned chat about data pipelines, please hop on our public Slack channel

If you’d like hands-on assistance setting up Great Expectations, establishing a healthy practice of data testing, or adding functionality to Great Expectations, please see options for consulting help here.

Can I contribute to the library?

Absolutely. Yes, please. Start here and please don’t be shy with questions.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.2.2

Nov 7, 2024

1.2.1

Oct 31, 2024

1.2.0

Oct 24, 2024

1.1.3

Oct 15, 2024

1.1.2

Oct 10, 2024

1.1.1

Oct 8, 2024

1.1.0

Oct 3, 2024

1.0.6

Oct 1, 2024

1.0.5

Sep 19, 2024

1.0.4

Sep 16, 2024

1.0.3 yanked

Sep 12, 2024

1.0.2

Sep 5, 2024

1.0.1

Aug 29, 2024

1.0.0

Aug 22, 2024

1.0.0a6 pre-release

Aug 21, 2024

1.0.0a5 pre-release

Aug 6, 2024

1.0.0a4 pre-release

May 16, 2024

1.0.0a3 pre-release

Apr 29, 2024

1.0.0a2 pre-release

Apr 15, 2024

1.0.0a1 pre-release

Feb 15, 2024

0.18.22

Oct 25, 2024

0.18.21

Sep 18, 2024

0.18.20

Sep 10, 2024

0.18.19

Jul 15, 2024

0.18.18

Jul 3, 2024

0.18.17

Jun 28, 2024

0.18.16

Jun 18, 2024

0.18.15

May 28, 2024

0.18.14

May 22, 2024

0.18.13

Apr 29, 2024

0.18.12

Mar 20, 2024

0.18.11

Mar 14, 2024

0.18.10

Feb 26, 2024

0.18.9

Feb 16, 2024

0.18.8

Jan 11, 2024

0.18.7

Dec 22, 2023

0.18.6 yanked

Dec 20, 2023

0.18.5

Dec 14, 2023

0.18.4

Dec 8, 2023

0.18.3

Nov 16, 2023

0.18.2

Nov 9, 2023

0.18.1

Nov 2, 2023

0.18.0

Oct 30, 2023

0.17.23

Oct 20, 2023

0.17.22

Oct 12, 2023

0.17.21

Oct 6, 2023

0.17.20 yanked

Sep 28, 2023

Reason this release was yanked:

Breaks GX Agent

0.17.19

Sep 21, 2023

0.17.18

Sep 20, 2023

0.17.17

Sep 18, 2023

0.17.16

Sep 15, 2023

0.17.15

Sep 7, 2023

0.17.14

Sep 1, 2023

0.17.13 yanked

Aug 31, 2023

0.17.12

Aug 24, 2023

0.17.11

Aug 17, 2023

0.17.9

Aug 10, 2023

0.17.8

Aug 4, 2023

0.17.7

Jul 27, 2023

0.17.6

Jul 21, 2023

0.17.5

Jul 13, 2023

0.17.4

Jul 10, 2023

0.17.3

Jul 7, 2023

0.17.2

Jun 29, 2023

0.17.1

Jun 22, 2023

0.17.0

Jun 15, 2023

0.16.16

Jun 8, 2023

0.16.15

Jun 1, 2023

0.16.14

May 26, 2023

0.16.13

May 18, 2023

0.16.12

May 11, 2023

0.16.11

May 4, 2023

0.16.10

Apr 28, 2023

0.16.9 yanked

Apr 28, 2023

0.16.8

Apr 20, 2023

0.16.7

Apr 13, 2023

0.16.6

Apr 6, 2023

0.16.5

Apr 2, 2023

0.16.4 yanked

Mar 31, 2023

0.16.3

Mar 24, 2023

0.16.2 yanked

Mar 23, 2023

0.16.1

Mar 16, 2023

0.16.0

Mar 10, 2023

0.15.50

Feb 23, 2023

0.15.49

Feb 17, 2023

0.15.48

Feb 9, 2023

0.15.47

Feb 2, 2023

0.15.46

Jan 26, 2023

0.15.45

Jan 26, 2023

0.15.44

Jan 19, 2023

0.15.43

Jan 12, 2023

0.15.42

Jan 5, 2023

0.15.41

Dec 15, 2022

0.15.40

Dec 13, 2022

0.15.39 yanked

Dec 10, 2022

0.15.38 yanked

Dec 9, 2022

0.15.37 yanked

Dec 8, 2022

0.15.36

Dec 1, 2022

0.15.35

Dec 1, 2022

0.15.34

Nov 18, 2022

0.15.33

Nov 17, 2022

0.15.32

Nov 10, 2022

0.15.31

Nov 4, 2022

0.15.30

Nov 3, 2022

0.15.29

Oct 28, 2022

0.15.28

Oct 20, 2022

0.15.27

Oct 13, 2022

0.15.26

Sep 29, 2022

0.15.25

Sep 23, 2022

0.15.24

Sep 19, 2022

0.15.23

Sep 16, 2022

0.15.22

Sep 8, 2022

0.15.21

Sep 1, 2022

0.15.20

Aug 25, 2022

0.15.19

Aug 18, 2022

0.15.18

Aug 11, 2022

0.15.17

Aug 4, 2022

0.15.16

Jul 29, 2022

0.15.15

Jul 21, 2022

0.15.14

Jul 14, 2022

0.15.13

Jul 7, 2022

0.15.12

Jun 30, 2022

0.15.11

Jun 22, 2022

0.15.10

Jun 15, 2022

0.15.9

Jun 9, 2022

0.15.8

Jun 2, 2022

0.15.7

May 26, 2022

0.15.6

May 19, 2022

0.15.5

May 12, 2022

0.15.4

May 5, 2022

0.15.3

Apr 28, 2022

0.15.2

Apr 21, 2022

0.15.1

Apr 14, 2022

0.15.0

Apr 8, 2022

0.14.13

Mar 31, 2022

0.14.12

Mar 24, 2022

0.14.11

Mar 17, 2022

0.14.10

Mar 10, 2022

0.14.9

Mar 4, 2022

0.14.8

Feb 24, 2022

0.14.7

Feb 17, 2022

0.14.6

Feb 10, 2022

0.14.5

Feb 3, 2022

0.14.4

Jan 28, 2022

0.14.3

Jan 27, 2022

0.14.2

Jan 20, 2022

0.14.1

Jan 13, 2022

0.14.0

Jan 6, 2022

0.13.49

Dec 24, 2021

0.13.48

Dec 23, 2021

0.13.47

Dec 18, 2021

0.13.46

Dec 9, 2021

0.13.45

Dec 2, 2021

0.13.44

Nov 24, 2021

0.13.43

Nov 18, 2021

0.13.42

Nov 12, 2021

0.13.41

Nov 4, 2021

0.13.40

Oct 27, 2021

0.13.39

Oct 21, 2021

0.13.38

Oct 14, 2021

0.13.37

Oct 7, 2021

0.13.36

Sep 30, 2021

0.13.35

Sep 23, 2021

0.13.34

Sep 16, 2021

0.13.33

Sep 9, 2021

0.13.32

Sep 2, 2021

0.13.31

Aug 26, 2021

0.13.30

Aug 23, 2021

0.13.29

Aug 19, 2021

0.13.28

Aug 13, 2021

0.13.27

Aug 12, 2021

0.13.26

Aug 5, 2021

0.13.25

Jul 30, 2021

0.13.24

Jul 22, 2021

0.13.23

Jul 15, 2021

0.13.22

Jul 9, 2021

0.13.21

Jun 30, 2021

0.13.20

Jun 23, 2021

0.13.19

Apr 23, 2021

0.13.18

Apr 22, 2021

0.13.17

Apr 2, 2021

0.13.16

Apr 1, 2021

0.13.15

Mar 26, 2021

0.13.14

Mar 17, 2021

0.13.13

Mar 12, 2021

0.13.12

Mar 5, 2021

0.13.11

Feb 25, 2021

0.13.10

Feb 13, 2021

0.13.9

Feb 8, 2021

0.13.8

Jan 28, 2021

0.13.7

Jan 23, 2021

0.13.6

Jan 21, 2021

0.13.5

Jan 19, 2021

0.13.4

Dec 23, 2020

0.13.3

Dec 15, 2020

0.13.2

Dec 8, 2020

0.13.1

Dec 3, 2020

0.13.0

Dec 1, 2020

0.12.10

Nov 25, 2020

0.12.9

Nov 17, 2020

0.12.8

Nov 16, 2020

0.12.7

Oct 29, 2020

0.12.6

Oct 20, 2020

0.12.5

Oct 19, 2020

0.12.4

Oct 7, 2020

0.12.3

Sep 28, 2020

0.12.2

Sep 22, 2020

0.12.1

Sep 2, 2020

0.12.0

Aug 13, 2020

0.11.9

Jul 30, 2020

0.11.8

Jul 16, 2020

0.11.7

Jul 2, 2020

0.11.6

Jun 27, 2020

0.11.5

Jun 19, 2020

0.11.4

Jun 13, 2020

0.11.3

Jun 13, 2020

0.11.2

Jun 5, 2020

0.11.1

May 29, 2020

0.11.0

May 23, 2020

0.11.0b0 pre-release

May 10, 2020

0.10.12

May 20, 2020

0.10.11

May 15, 2020

0.10.10

May 14, 2020

0.10.9

May 8, 2020

0.10.8

May 4, 2020

0.10.7

May 1, 2020

0.10.6

May 1, 2020

0.10.5

Apr 29, 2020

0.10.4

Apr 24, 2020

0.10.3

Apr 22, 2020

0.10.2

Apr 21, 2020

0.10.1

Apr 16, 2020

0.10.0

Apr 15, 2020

0.9.11

Apr 10, 2020

0.9.10

Apr 8, 2020

0.9.9

Apr 7, 2020

0.9.8

Apr 3, 2020

0.9.7

Mar 19, 2020

0.9.6

Mar 18, 2020

0.9.5

Mar 13, 2020

0.9.4

Mar 11, 2020

0.9.3

Mar 6, 2020

0.9.2

Feb 22, 2020

0.9.1

Feb 21, 2020

0.9.0

Feb 19, 2020

0.9.0b2 pre-release

Feb 11, 2020

0.9.0b1 pre-release

Dec 12, 2019

0.9.0b0 pre-release

Nov 26, 2019

0.8.8

Feb 7, 2020

0.8.7

Jan 15, 2020

0.8.6

Dec 4, 2019

0.8.5

Nov 19, 2019

0.8.4.post0

Nov 7, 2019

0.8.4

Nov 6, 2019

0.8.3

Oct 29, 2019

0.8.2.post0

Oct 24, 2019

0.8.2

Oct 23, 2019

0.8.1

Oct 16, 2019

This version

0.8.0

Oct 16, 2019

0.8.0a4 pre-release

Oct 11, 2019

0.8.0a3 pre-release

Oct 7, 2019

0.8.0a2 pre-release

Oct 3, 2019

0.8.0a1 pre-release

Sep 30, 2019

0.7.11

Oct 4, 2019

0.7.10

Sep 19, 2019

0.7.9

Sep 18, 2019

0.7.8

Sep 4, 2019

0.7.7

Aug 19, 2019

0.7.6

Aug 12, 2019

0.7.5

Aug 3, 2019

0.7.4

Aug 3, 2019

0.7.3

Jul 29, 2019

0.7.2

Jul 22, 2019

0.7.1

Jul 13, 2019

0.7.0

Jul 4, 2019

0.6.1

Jun 3, 2019

0.6.0

May 24, 2019

0.5.1

Apr 30, 2019

0.5.0

Apr 25, 2019

0.4.5

Dec 19, 2018

0.4.4

Aug 29, 2018

0.4.3

Jul 12, 2018

0.4.2

May 17, 2018

0.4.1

Mar 24, 2018

0.4.0

Mar 23, 2018

0.3.2

Feb 8, 2018

0.3.1

Feb 8, 2018

0.3.0

Dec 22, 2017

0.0.1111.post0.dev17 pre-release

Aug 4, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

great_expectations-0.8.0.tar.gz (1.1 MB view details)

Uploaded Oct 16, 2019 Source

Built Distribution

great_expectations-0.8.0-py2.py3-none-any.whl (625.9 kB view details)

Uploaded Oct 16, 2019 Python 2 Python 3

File details

Details for the file great_expectations-0.8.0.tar.gz.

File metadata

Download URL: great_expectations-0.8.0.tar.gz
Upload date: Oct 16, 2019
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for great_expectations-0.8.0.tar.gz
Algorithm	Hash digest
SHA256	`f5b6214bfbaf2a2ac9a0ba5873a59da0d7437922647406a2ba58e116f656d097`
MD5	`13ab7c0e86325f6492d7fa0f0b2389fc`
BLAKE2b-256	`decc5d20504215d197c20972c0a96249767c6c127e4054cc0b4c3cdcb03b96e8`

See more details on using hashes here.

File details

Details for the file great_expectations-0.8.0-py2.py3-none-any.whl.

File metadata

Download URL: great_expectations-0.8.0-py2.py3-none-any.whl
Upload date: Oct 16, 2019
Size: 625.9 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for great_expectations-0.8.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`148fde8a42bc791148e52bb0a4a3ad451b013f4dc906d6bd49133eec7e3e424f`
MD5	`ae4cdf24847c7c2abf7980c83532918d`
BLAKE2b-256	`a52326cc1983cf8df5ebb2e0a61bad008431b320f9a4439aef5f540587b12bc3`