Skip to main content

Expectation Suite

SetupArrowConnect to DataArrowCreate ExpectationsArrowValidate Data

Overview

Definition

An Expectation Suite is a collection of verifiable assertions about data.

Features and promises

Expectation Suites combine multiple ExpectationsA verifiable assertion about data. into an overall description of data. For example, a team can group all the Expectations about a given table in given database into an Expectation Suite and call it my_database.my_table. Note these names are completely flexible and the only constraint on the name of a suite is that it must be unique to a given project.

Relationship to other objects

Expectation Suites are stored in an Expectation StoreA connector to store and retrieve information about collections of verifiable assertions about data.. They are generated interactively using a ValidatorUsed to run an Expectation Suite against data. or automatically using ProfilersGenerates Metrics and candidate Expectations from data., and are used by CheckpointsThe primary means for validating data in a production deployment of Great Expectations. to ValidateThe act of applying an Expectation Suite to a Batch. data.

Use cases

Setup

Create Expectations

The lifecycle of an Expectation Suite starts with creating it. Then it goes through an iterative loop of Review and Edit as the team's understanding of the data described by the suite evolves.

Expectation Suites are largely managed automatically in the workflows for creating Expectations. When the Expectations are created, an Expectation Suite is created to contain them. In the Profiling workflow, this Expectation Suite will contain all the Expectations generated by the Profiler. In the interactive workflow, an Expectation Suite will be configured to include Expectations as they are defined, but will not be saved to an Expectation Store until you issue the command for it to be.

For more information on these processes, please see:

Setup

Validate Data

Expectation Suites are used during the Validation of data. In this step, you will need to provide one or more Expectation Suites to a Checkpoint. This can either be done by configuring the Checkpoint to use a preset list of one or more Expectation Suites, or by configuring the Checkpoint to accept a list of one or more Expectation Suites at runtime.

Features

CRUD operations

A Great Expectations Expectation Suite enables you to perform Create, Read, Update, and Delete (CRUD) operations on the Suite's Expectations without needing to re-run them.

Reusability

Expectation Suites are primarily used by Checkpoints, which can accept a list of one or more Expectation Suite and Batch Request pairs. Because they are stored independently of the Checkpoints that use them, the same Expectation Suite can be included in the list for multiple Checkpoints, provided the Expectation Suite contains a list of Expectations that describe the data that Checkpoint will Validate. You can even use the same Expectation Suite multiple times within the same Checkpoint by pairing it with different Batch Requests.

API basics

CRUD operations

Each of the Expectation Suite methods that support a Create, Read, Update, or Delete (CRUD) operation relies on two main parameters - expectation_configuration and match_type.

  • expectation_configuration - an ExpectationConfiguration object that is used to determine whether and where this Expectation already exists within the Suite. It can be a complete or a partial ExpectationConfiguration.
  • match_type - a string with the value of domain, success, or runtime which determines the criteria used for matching:
    • domain checks whether two Expectation Configurations apply to the same data. It results in the loosest match, and can use the least complete ExpectationConfiguration object. For example, for a column map Expectation, a domain match_type will check that the expectation_type matches, and that the column and any row_conditions that affect which rows are evaluated by the Expectation match.
    • success criteria are more exacting - in addition to the domain kwargs, these include those kwargs used when evaluating the success of an Expectation, like mostly, max, or value_set. -runtime are the most specific - in addition to domain_kwargs and success_kwargs, these include kwargs used for runtime configuration. Currently, these include result_format, include_config, and catch_exceptions

How to access

You will rarely need to directly access an Expectation Suite. If you do need to edit one, the simplest way is through the CLI. To do so, run the command:

Terminal command
great_expectations suite edit NAME_OF_YOUR_SUITE_HERE

This will open a Jupyter Notebook where each Expectation in the Expectation Suite is loaded as an individual cell. You can edit, remove, and add Expectations in this list. Running the cells will create the Expectations in a new Expectation Suite, which you can then save over the old Expectation Suite or save under a new name. The Expectation Suite and any changes made will not be stored until you give the command for it to be saved, however.

In almost all other circumstances you will simply pass the name of any relevant Expectation Suites to an object such as a Checkpoint that will manage accessing and using it for you.

Saving Expectation Suites

Each Expectation Suite is saved in an Expectation Store, as a JSON file in the great_expectations/expectations subdirectory of the Data Context. Best practice is for users to check these files into the version control each time they are updated, in the same way they treat their source files. This discipline allows data quality to be an integral part of versioned pipeline releases.

You can save an Expectation Suite by using a Validator'sUsed to run an Expectation Suite against data. save_expectation_suite() method. This method will be included in the last cell of any Jupyter notebook launched from the CLI for the purpose of creating or editing Expectations.