Skip to main content

How to create a new Checkpoint

This guide will help you create a new CheckpointThe primary means for validating data in a production deployment of Great Expectations., which allows you to couple an Expectation SuiteA collection of verifiable assertions about data. with a data set to ValidateThe act of applying an Expectation Suite to a Batch..

note

As of Great Expectations version 0.13.7, we have updated and improved the Checkpoints feature. You can continue to use your existing legacy Checkpoint workflows if you’re working with concepts from the Batch Kwargs (v2) API. If you’re using concepts from the BatchRequest (v3) API, please refer to the new Checkpoints guides.

Steps (for Checkpoints in Great Expectations version >=0.13.12)

1. Use the CLI to open a Jupyter Notebook for creating a new Checkpoint

To assist you with creating Checkpoints, the Great Expectations CLICommand Line Interface has a convenience method that will open a Jupyter Notebook with all the scaffolding you need to easily configure and save your Checkpoint. Simply run the following CLI command from your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.:

Terminal input
> great_expectations checkpoint new my_checkpoint
tip

You can replace my_checkpoint in the above example with whatever name you would like to associate with the Checkpoint you will be creating.

Executing this command will open a Jupyter Notebook which will guide you through the steps of creating a Checkpoint. This Jupyter Notebook will include a default configuration that you can edit to suite your use case.

2. Configure your SimpleCheckpoint (Example)

2.1. Edit the configuration

The sample Checkpoint configuration in your Jupyter Notebook will utilize the SimpleCheckpoint class, which takes care of some defaults.

To update this configuration to suit your environment, you will need to replace the names my_datasource, my_data_connector, MyDataAsset and my_suite with the respective DatasourceProvides a standard API for accessing and interacting with data from a wide variety of source systems., Data ConnectorProvides the configuration details based on the source data system which are needed by a Datasource to define Data Assets., Data AssetA collection of records within a Datasource which is usually named based on the underlying data system and sliced to correspond to a desired specification., and Expectation SuiteA collection of verifiable assertions about data. names you have configured in your great_expectations.yml.

Example YAML configuration, as a Python string
config = """
name: my_checkpoint # This is populated by the CLI.
config_version: 1
class_name: SimpleCheckpoint
validations:
- batch_request:
datasource_name: my_datasource # Update this value.
data_connector_name: my_data_connector # Update this value.
data_asset_name: MyDataAsset # Update this value.
data_connector_query:
index: -1
expectation_suite_name: my_suite # Update this value.
"""

This is the minimum required to configure a Checkpoint that will run the Expectation Suite my_suite against the Data Asset MyDataAsset.

See How to configure a new Checkpoint using test_yaml_config for advanced configuration options.

2.2. Validate and test your configuration

You can use the following command to validate the contents of your config yaml string:

Python code
context.test_yaml_config(yaml_config=config)

When executed, test_yaml_config(...) will instantiate the component and run through a self-check procedure to verify that the component works as expected.

In the case of a Checkpoint, this means:

  1. Validating the yaml configuration
  2. Verifying that the Checkpoint class with the given configuration, if valid, can be instantiated
  3. Printing warnings in case certain parts of the configuration, while valid, may be incomplete and need to be better specified for a successful Checkpoint operation

The output will look something like this:

Terminal output
Attempting to instantiate class from config...
Instantiating as a SimpleCheckpoint, since class_name is SimpleCheckpoint
Successfully instantiated SimpleCheckpoint


Checkpoint class name: SimpleCheckpoint

If something about your configuration was not set up correctly, test_yaml_config(...) will raise an error.

3. Store your Checkpoint configuration

After you are satisfied with your configuration, save it by running the appropriate cells in the Jupyter Notebook.

4. (Optional) Check your stored Checkpoint config

If the StoreA connector to store and retrieve information about metadata in Great Expectations. backend of your Checkpoint StoreA connector to store and retrieve information about means for validating data in a production deployment of Great Expectations. is on the local filesystem, you can navigate to the checkpoints store directory that is configured in great_expectations.yml and find the configuration files corresponding to the Checkpoints you created.

5. (Optional) Test run the new Checkpoint and open Data Docs

Now that you have stored your Checkpoint configuration to the Store backend configured for the Checkpoint Configuration store of your Data Context, you can also test context.run_checkpoint(...), right within your Jupyter Notebook by running the appropriate cells.

caution

Before running a Checkpoint, make sure that all classes and Expectation Suites referred to in the configuration exist.

When run_checkpoint(...) returns, the checkpoint_run_result can then be checked for the value of the success field (all validations passed) and other information associated with running the specified ActionsA Python class with a run method that takes a Validation Result and does something with it.

For more advanced configurations of Checkpoints, please see How to configure a new Checkpoint using test_yaml_config.

Additional Resources