Skip to main content

Customize your deployment

Customizing your deployment by upgrading specific components of your deployment is a straight forward task. Data Contexts make this modular, so that you can add or swap out one component at a time. Most of these changes are quick, incremental steps—so you can upgrade from a basic demo deployment to a full production deployment at your own pace and be confident that your Data Context will continue to work at every step along the way.

This reference guide is designed to present you with clear options for upgrading your deployment. For specific implementation steps, please check out the linked How-to guides.

Components

Here’s an overview of the components of a typical Great Expectations deployment:

Options for storing Great Expectations configuration

The simplest way to manage your Great Expectations configuration is usually by committing great_expectations/great_expectations.yml to Git. However, it’s not usually a good idea to commit credentials to source control. In some situations, you might need to deploy without access to source control (or maybe even a file system).

Here’s how to handle each of those cases:

Options for storing Expectations

Many teams find it convenient to store Expectations in Git. Essentially, this approach treats Expectations like test fixtures: they live adjacent to code and are stored within version control. Git acts as a collaboration tool and source of record.

Alternatively, you can treat Expectations like configs, and store them in a blob store. Finally, you can store them in a database.

Options for storing Validation Results

By default, Validation Results are stored locally, in an uncommitted directory. This is great for individual work, but not good for collaboration. The most common pattern is to use a cloud-based blob store such as S3, GCS, or Azure blob store. You can also store Validation Results in a database.

Options for customizing generated notebooks

Great Expectations generates and provides notebooks as interactive development environments for Expectation Suites. You might want to customize parts of the notebooks to add company-specific documentation, or change the code sections to suit your use-cases.

Reference Architectures

Connecting to Data

Great Expectations allows you to connect to data in a wide variety of sources, and the list is constantly getting longer. If you have an idea for a source not listed here, please speak up in the public discussion forum.

Options for hosting Data Docs

By default, Data Docs are stored locally, in an uncommitted directory. This is great for individual work, but not good for collaboration. A better pattern is usually to deploy to a cloud-based blob store (S3, GCS, or Azure Blob Storage), configured to share a static website.

Additional Checkpoints and Actions

Most teams will want to configure various Checkpoints and Validation Actions as part of their deployment. There are two primary patterns for deploying Checkpoints. Sometimes Checkpoints are executed during data processing (e.g. as a task within Airflow). From this vantage point, they can control program flow. Sometimes Checkpoints are executed against materialized data. Great Expectations supports both patterns. There are also some rare instances where you may want to validate data without using a Checkpoint.

Not interested in managing your own configuration or infrastructure?

Learn more about Great Expectations Cloud — our fully managed SaaS offering. Sign up for our weekly cloud workshop! You’ll get to see our newest features and apply for our private Alpha program!