Skip to main content

How to Validate data with an in-memory Checkpoint

This guide will demonstrate how to Validate data using a Checkpoint that is configured and run entirely in-memory. This workflow is appropriate for environments or workflows where a user does not want to or cannot use a Checkpoint Store, e.g. in a hosted environment.

Prerequisites: This how-to guide assumes you have:
  • Completed the Getting Started Tutorial
  • A working installation of Great Expectations
  • Have a Data Context
  • Have an Expectation Suite
  • Have a Datasource
  • Have a basic understanding of Checkpoints
note

Reading our guide on Deploying Great Expectations in a hosted environment without file system or CLI is recommended for guidance on the setup, connecting to data, and creating expectations steps that take place prior to this process.

Steps

1. Import the necessary modules

The recommended method for creating a Checkpoint is to use the CLI to open a Jupyter Notebook which contains code scaffolding to assist you with the process. Since that option is not available (this guide is assuming that your need for an in-memory Checkpoint is due to being unable to use the CLI or access a filesystem) you will have to provide that scaffolding yourself.

In the script that you are defining and executing your Checkpoint in, enter the following code:

import great_expectations as gx
from great_expectations.checkpoint import Checkpoint

Importing great_expectations will give you access to your Data Context, while we will configure an instance of the Checkpoint class as our in-memory Checkpoint.

If you are planning to use a YAML string to configure your in-memory Checkpoint you will also need to import yaml from ruamel:

from ruamel import yaml

You will also need to initialize yaml.YAML(...):

yaml = yaml.YAML(typ="safe")

2. Initialize your Data Context

In the previous section you imported great_expectations in order to get access to your Data Context. The line of code that does this is:

context = gx.get_context()

Checkpoints require a Data Context in order to access necessary Stores from which to retrieve Expectation Suites and store Validation Results and Metrics, so you will pass context in as a parameter when you initialize your Checkpoint class later.

3. Define your Checkpoint configuration

In addition to a Data Context, you will need a configuration with which to initialize your Checkpoint. This configuration can be in the form of a YAML string or a Python dictionary, The following examples show configurations that are equivalent to the one used by the Getting Started Tutorial.

Normally, a Checkpoint configuration will include the keys class_name and module_name. These are used by Great Expectations to identify the class of Checkpoint that should be initialized with a given configuration. Since we are initializing an instance of the Checkpoint class directly we don't need the configuration to indicate the class of Checkpoint to be initialized. Therefore, these two keys will be left out of our configuration.

my_checkpoint_name = "in_memory_checkpoint"
python_config = {
"name": my_checkpoint_name,
"config_version": 1,
"run_name_template": "%Y%m%d-%H%M%S-my-run-name-template",
"action_list": [
{
"name": "store_validation_result",
"action": {"class_name": "StoreValidationResultAction"},
},
{
"name": "store_evaluation_params",
"action": {"class_name": "StoreEvaluationParametersAction"},
},
{
"name": "update_data_docs",
"action": {"class_name": "UpdateDataDocsAction", "site_names": []},
},
],
"validations": [
{
"batch_request": {
"datasource_name": "taxi_datasource",
"data_connector_name": "default_inferred_data_connector_name",
"data_asset_name": "yellow_tripdata_sample_2019-01",
"data_connector_query": {"index": -1},
},
"expectation_suite_name": "my_expectation_suite",
}
],
}

When you are tailoring the configuration for your own purposes, you will want to replace the Batch Request and Expectation Suite under the validations key with your own values. You can further edit the configuration to add additional Batch Request and Expectation Suite entries under the validations key. Alternatively, you can even replace this configuration entirely and build one from scratch. If you choose to build a configuration from scratch, or to further modify the examples provided above, you may wish to reference our documentation on Checkpoint configurations as you do.

4. Initialize your Checkpoint

Once you have your Data Context and Checkpoint configuration you will be able to initialize a Checkpoint instance in memory. There is a minor variation in how you do so, depending on whether you are using a Python dictionary or a YAML string for your configuration.

If you are using a Python dictionary as your configuration, you will need to unpack it as parameters for the Checkpoint object's initialization. This can be done with the code:

my_checkpoint = Checkpoint(data_context=context, **python_config)

5. Run your Checkpoint

Congratulations! You now have an initialized Checkpoint object in memory. You can now use it's run(...) method to Validate your data as specified in the configuration.

This will be done with the line:

results = my_checkpoint.run()

Congratulations! Your script is now ready to be run. Each time you run it, it will initialize and run a Checkpoint in memory, rather than retrieving a Checkpoint configuration from a Checkpoint Store.

6. Check your Data Docs

Once you have run your script you can verify that it has worked by checking your Data Docs for new results.

Notes

To view the full example scripts used in this documentation, see: