Skip to main content

FileDataContext

class great_expectations.data_context.FileDataContext(project_config: Optional[DataContextConfig] = None, context_root_dir: Optional[PathStr] = None, runtime_environment: Optional[dict] = None)#

Subclass of AbstractDataContext that contains functionality necessary to work in a filesystem-backed environment.

add_checkpoint(name: str, config_version: Optional[Union[int, float]] = None, template_name: Optional[str] = None, module_name: Optional[str] = None, class_name: Optional[str] = None, run_name_template: Optional[str] = None, expectation_suite_name: Optional[str] = None, batch_request: Optional[dict] = None, action_list: Optional[List[dict]] = None, evaluation_parameters: Optional[dict] = None, runtime_configuration: Optional[dict] = None, validations: Optional[List[dict]] = None, profilers: Optional[List[dict]] = None, validation_operator_name: Optional[str] = None, batches: Optional[List[dict]] = None, site_names: Optional[Union[str, List[str]]] = None, slack_webhook: Optional[str] = None, notify_on: Optional[str] = None, notify_with: Optional[Union[str, List[str]]] = None, ge_cloud_id: Optional[str] = None, expectation_suite_ge_cloud_id: Optional[str] = None, default_validation_id: Optional[str] = None) Checkpoint#

Add a Checkpoint to the DataContext.

-Relevant Documentation Links -
Parameters:
  • name – The name to give the checkpoint.

  • config_version – The config version of this checkpoint.

  • template_name – The template to use in generating this checkpoint.

  • module_name – The module name to use in generating this checkpoint.

  • class_name – The class name to use in generating this checkpoint.

  • run_name_template – The run name template to use in generating this checkpoint.

  • expectation_suite_name – The expectation suite name to use in generating this checkpoint.

  • batch_request – The batch request to use in generating this checkpoint.

  • action_list – The action list to use in generating this checkpoint.

  • evaluation_parameters – The evaluation parameters to use in generating this checkpoint.

  • runtime_configuration – The runtime configuration to use in generating this checkpoint.

  • validations – The validations to use in generating this checkpoint.

  • profilers – The profilers to use in generating this checkpoint.

  • validation_operator_name

    The validation operator name to use in generating this checkpoint. This is only used for LegacyCheckpoint configuration.

    Deprecated since version 0.14.0.

  • batches

    The batches to use in generating this checkpoint. This is only used for LegacyCheckpoint configuration.

    Deprecated since version 0.14.0.

  • site_names – The site names to use in generating this checkpoint. This is only used for SimpleCheckpoint configuration.

  • slack_webhook – The slack webhook to use in generating this checkpoint. This is only used for SimpleCheckpoint configuration.

  • notify_on – The notify on setting to use in generating this checkpoint. This is only used for SimpleCheckpoint configuration.

  • notify_with – The notify with setting to use in generating this checkpoint. This is only used for SimpleCheckpoint configuration.

  • ge_cloud_id – The GE Cloud ID to use in generating this checkpoint.

  • expectation_suite_ge_cloud_id – The expectation suite GE Cloud ID to use in generating this checkpoint.

  • default_validation_id – The default validation ID to use in generating this checkpoint.

Returns:

The Checkpoint object created.

add_datasource(name: str, initialize: bool = True, save_changes: Optional[bool] = None, **kwargs) LegacyDatasource | BaseDatasource | None#

Add a new Datasource to the data context, with configuration provided as kwargs.

Relevant Documentation Links
Parameters:
  • name – the name of the new Datasource to add

  • initialize – if False, add the Datasource to the config, but do not initialize it, for example if a user needs to debug database connectivity.

  • save_changes

    should GX save the Datasource config?

    Deprecated since version 0.15.32.

  • kwargs – the configuration for the new Datasource

Returns:

Datasource instance added.

add_store(store_name: str, store_config: dict) great_expectations.data_context.store.store.Store#

Add a new Store to the DataContext.

Parameters:
  • store_name – the name to associate with the created store.

  • store_config – the config to use to construct the store.

Returns:

The instantiated Store.

build_data_docs(site_names=None, resource_identifiers=None, dry_run=False, build_index: bool = True)#

Build Data Docs for your project.

Relevant Documentation Links
Parameters:
  • site_names – if specified, build data docs only for these sites, otherwise, build all the sites specified in the context's config

  • resource_identifiers – a list of resource identifiers (ExpectationSuiteIdentifier, ValidationResultIdentifier). If specified, rebuild HTML (or other views the data docs sites are rendering) only for the resources in this list. This supports incremental build of data docs sites (e.g., when a new validation result is created) and avoids full rebuild.

  • dry_run – a flag, if True, the method returns a structure containing the URLs of the sites that would be built, but it does not build these sites.

  • build_index – a flag if False, skips building the index page

Returns:

A dictionary with the names of the updated data documentation sites as keys and the the location info of their index.html files as values

Raises:

ClassInstantiationError – Site config in your Data Context config is not valid.

classmethod create(project_root_dir: Optional[PathStr] = None, usage_statistics_enabled: bool = True, runtime_environment: Optional[dict] = None) SerializableDataContext#

Build a new great_expectations directory and DataContext object in the provided project_root_dir.

create will create a new "great_expectations" directory in the provided folder, provided one does not already exist. Then, it will initialize a new DataContext in that folder and write the resulting config.

Relevant Documentation Links

Data Context

Parameters:
  • project_root_dir – path to the root directory in which to create a new great_expectations directory

  • usage_statistics_enabled – boolean directive specifying whether or not to gather usage statistics

  • runtime_environment – a dictionary of config variables that override both those set in config_variables.yml and the environment

Returns:

DataContext

create_expectation_suite(expectation_suite_name: str, overwrite_existing: bool = False, **kwargs: Optional[dict]) great_expectations.core.expectation_suite.ExpectationSuite#

Build a new ExpectationSuite and save it utilizing the context's underlying ExpectationsStore.

Note that this method can be called by itself or run within the get_validator workflow.

When run with create_expectation_suite():

expectation_suite_name = "genres_movies.fkey"
context.create_expectation_suite(expectation_suite_name, overwrite_existing=True)
batch = context.get_batch(
expectation_suite_name=expectation_suite_name
)

When run as part of get_validator():

validator = context.get_validator(
datasource_name="my_datasource",
data_connector_name="whole_table",
data_asset_name="my_table",
create_expectation_suite_with_name="my_expectation_suite",
)
validator.expect_column_values_to_be_in_set("c1", [4,5,6])

Parameters:
  • expectation_suite_name – The name of the suite to create.

  • overwrite_existing – Whether to overwrite if a suite with the given name already exists.

  • **kwargs – Any key-value arguments to pass to the store when persisting.

Returns:

A new (empty) ExpectationSuite.

Raises:
  • ValueError – The input overwrite_existing is of the wrong type.

  • DataContextError – A suite with the same name already exists (and overwrite_existing is not enabled).

delete_checkpoint(name: Optional[str] = None, ge_cloud_id: Optional[str] = None) None#

Deletes a given Checkpoint by either name or id.

Parameters:
  • name – The name of the target Checkpoint.

  • ge_cloud_id – The id associated with the target Checkpoint.

Raises:

CheckpointNotFoundError if the requested Checkpoint does not exists.

delete_datasource(datasource_name: Optional[str], save_changes: Optional[bool] = None) None#

Delete a given Datasource by name.

Note that this method causes deletion from the underlying DatasourceStore. This can be overridden to only impact the Datasource cache through the deprecatedsave_changes argument.

Parameters:
  • datasource_name – The name of the target datasource.

  • save_changes

    Should this change be persisted by the DatasourceStore?

    Deprecated since version 0.15.32.

Raises:

ValueError – The datasource_name isn't provided or cannot be found.

get_available_data_asset_names(datasource_names: str | list[str] | None = None, batch_kwargs_generator_names: str | list[str] | None = None)#

Inspect datasource and batch kwargs generators to provide available data_asset objects.

Parameters:
  • datasource_names – List of datasources for which to provide available data asset name objects. If None, return available data assets for all datasources.

  • batch_kwargs_generator_names – List of batch kwargs generators for which to provide available data_asset_name objects.

Returns:

Dictionary describing available data assets

Return type:

data_asset_names

Raises:

ValueErrordatasource_names is not None, a string, or list of strings.

get_batch_list(datasource_name: Optional[str] = None, data_connector_name: Optional[str] = None, data_asset_name: Optional[str] = None, batch_request: Optional[great_expectations.core.batch.BatchRequestBase] = None, batch_data: Optional[Any] = None, data_connector_query: Optional[dict] = None, batch_identifiers: Optional[dict] = None, limit: Optional[int] = None, index: Optional[Union[int, list, tuple, slice, str]] = None, custom_filter_function: Optional[Callable] = None, sampling_method: Optional[str] = None, sampling_kwargs: Optional[dict] = None, splitter_method: Optional[str] = None, splitter_kwargs: Optional[dict] = None, runtime_parameters: Optional[dict] = None, query: Optional[str] = None, path: Optional[str] = None, batch_filter_parameters: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None, **kwargs: Optional[dict]) List[great_expectations.core.batch.Batch]#

Get the list of zero or more batches, based on a variety of flexible input types.

get_batch_list is the main user-facing API for getting batches. In contrast to virtually all other methods in the class, it does not require typed or nested inputs. Instead, this method is intended to help the user pick the right parameters

This method attempts to return any number of batches, including an empty list.

Parameters:
  • datasource_name – The name of the Datasource that defines the Data Asset to retrieve the batch for

  • data_connector_name – The Data Connector within the datasource for the Data Asset

  • data_asset_name – The name of the Data Asset within the Data Connector

  • batch_request – Encapsulates all the parameters used here to retrieve a BatchList. Use eitherbatch_request or the other params (but not both)

  • batch_data – Provides runtime data for the batch; is added as the key batch_data to the runtime_parameters dictionary of a BatchRequest

  • query – Provides runtime data for the batch; is added as the key query to the runtime_parameters dictionary of a BatchRequest

  • path – Provides runtime data for the batch; is added as the key path to the runtime_parameters dictionary of a BatchRequest

  • runtime_parameters – Specifies runtime parameters for the BatchRequest; can includes keys batch_data,query, and path

  • data_connector_query – Used to specify connector query parameters; specifically batch_filter_parameters,limit, index, and custom_filter_function

  • batch_identifiers – Any identifiers of batches for the BatchRequest

  • batch_filter_parameters – Filter parameters used in the data connector query

  • limit – Part of the data_connector_query, limits the number of batches in the batch list

  • index – Part of the data_connector_query, used to specify the index of which batch to return. Negative numbers retrieve from the end of the list (ex: -1 retrieves the last or latest batch)

  • custom_filter_function – A Callable function that accepts batch_identifiers and returns a bool

  • sampling_method – The method used to sample Batch data (see: Splitting and Sampling)

  • sampling_kwargs – Arguments for the sampling method

  • splitter_method – The method used to split the Data Asset into Batches

  • splitter_kwargs – Arguments for the splitting method

  • batch_spec_passthrough – Arguments specific to the ExecutionEngine that aid in Batch retrieval

  • **kwargs – Used to specify either batch_identifiers or batch_filter_parameters

Returns:

(Batch) The list of requested Batch instances

Raises:
  • DatasourceError – If the specified datasource_name does not exist in the DataContext

  • TypeError – If the specified types of the batch_request are not supported, or if thedatasource_name is not a str

  • ValueError – If more than one exclusive parameter is specified (ex: specifing more than one of batch_data, query or path)

get_datasource(datasource_name: str = 'default') Union[great_expectations.datasource.datasource.LegacyDatasource, great_expectations.datasource.new_datasource.BaseDatasource, great_expectations.experimental.datasources.interfaces.Datasource]#

Retrieve a given Datasource by name from the context's underlying DatasourceStore.

Parameters:

datasource_name – The name of the target datasource.

Returns:

The target datasource.

Raises:

ValueError – The input datasource_name is None.

get_expectation_suite(expectation_suite_name: Optional[str] = None, include_rendered_content: Optional[bool] = None, ge_cloud_id: Optional[str] = None) great_expectations.core.expectation_suite.ExpectationSuite#

Get an Expectation Suite by name.

Parameters:
  • expectation_suite_name (str) – The name of the Expectation Suite

  • include_rendered_content (bool) – Whether or not to re-populate rendered_content for each ExpectationConfiguration.

  • ge_cloud_id (str) –

    The GX Cloud ID for the Expectation Suite (unused)

    Deprecated since version 0.15.45.

Returns:

An existing ExpectationSuite

Raises:

DataContextError – There is no expectation suite with the name provided

get_validator(datasource_name: Optional[str] = None, data_connector_name: Optional[str] = None, data_asset_name: Optional[str] = None, batch: Optional[great_expectations.core.batch.Batch] = None, batch_list: Optional[List[great_expectations.core.batch.Batch]] = None, batch_request: Optional[great_expectations.core.batch.BatchRequestBase] = None, batch_request_list: Optional[List[great_expectations.core.batch.BatchRequestBase]] = None, batch_data: Optional[Any] = None, data_connector_query: Optional[Union[great_expectations.core.id_dict.IDDict, dict]] = None, batch_identifiers: Optional[dict] = None, limit: Optional[int] = None, index: Optional[Union[int, list, tuple, slice, str]] = None, custom_filter_function: Optional[Callable] = None, sampling_method: Optional[str] = None, sampling_kwargs: Optional[dict] = None, splitter_method: Optional[str] = None, splitter_kwargs: Optional[dict] = None, runtime_parameters: Optional[dict] = None, query: Optional[str] = None, path: Optional[str] = None, batch_filter_parameters: Optional[dict] = None, expectation_suite_ge_cloud_id: Optional[str] = None, batch_spec_passthrough: Optional[dict] = None, expectation_suite_name: Optional[str] = None, expectation_suite: Optional[great_expectations.core.expectation_suite.ExpectationSuite] = None, create_expectation_suite_with_name: Optional[str] = None, include_rendered_content: Optional[bool] = None, **kwargs: Optional[dict]) great_expectations.validator.validator.Validator#

Retrieve a Validator with a batch list and an ExpectationSuite.

get_validator first calls get_batch_list to retrieve a batch list, then creates or retrieves an ExpectationSuite used to validate the Batches in the list.

Parameters:
  • datasource_name – The name of the Datasource that defines the Data Asset to retrieve the batch for

  • data_connector_name – The Data Connector within the datasource for the Data Asset

  • data_asset_name – The name of the Data Asset within the Data Connector

  • batch – The Batch to use with the Validator

  • batch_list – The List of Batches to use with the Validator

  • batch_request – Encapsulates all the parameters used here to retrieve a BatchList. Use eitherbatch_request or the other params (but not both)

  • batch_request_list – A List of BatchRequest to use with the Validator

  • batch_data – Provides runtime data for the batch; is added as the key batch_data to the runtime_parameters dictionary of a BatchRequest

  • query – Provides runtime data for the batch; is added as the key query to the runtime_parameters dictionary of a BatchRequest

  • path – Provides runtime data for the batch; is added as the key path to the runtime_parameters dictionary of a BatchRequest

  • runtime_parameters – Specifies runtime parameters for the BatchRequest; can includes keys batch_data,query, and path

  • data_connector_query – Used to specify connector query parameters; specifically batch_filter_parameters,limit, index, and custom_filter_function

  • batch_identifiers – Any identifiers of batches for the BatchRequest

  • batch_filter_parameters – Filter parameters used in the data connector query

  • limit – Part of the data_connector_query, limits the number of batches in the batch list

  • index – Part of the data_connector_query, used to specify the index of which batch to return. Negative numbers retrieve from the end of the list (ex: -1 retrieves the last or latest batch)

  • custom_filter_function – A Callable function that accepts batch_identifiers and returns a bool

  • sampling_method – The method used to sample Batch data (see: Splitting and Sampling)

  • sampling_kwargs – Arguments for the sampling method

  • splitter_method – The method used to split the Data Asset into Batches

  • splitter_kwargs – Arguments for the splitting method

  • batch_spec_passthrough – Arguments specific to the ExecutionEngine that aid in Batch retrieval

  • expectation_suite_ge_cloud_id – The identifier of the ExpectationSuite to retrieve from the DataContext (can be used in place of expectation_suite_name)

  • expectation_suite_name – The name of the ExpectationSuite to retrieve from the DataContext

  • expectation_suite – The ExpectationSuite to use with the validator

  • create_expectation_suite_with_name – Creates a Validator with a new ExpectationSuite with the provided name

  • include_rendered_content – If True the ExpectationSuite will include rendered content when saved

  • **kwargs – Used to specify either batch_identifiers or batch_filter_parameters

Returns:

A Validator with the specified Batch list and ExpectationSuite

Return type:

Validator

Raises:
  • DatasourceError – If the specified datasource_name does not exist in the DataContext

  • TypeError – If the specified types of the batch_request are not supported, or if thedatasource_name is not a str

  • ValueError – If more than one exclusive parameter is specified (ex: specifing more than one of batch_data, query or path), or if the ExpectationSuite cannot be created or retrieved using either the provided name or identifier

list_checkpoints() Union[List[str], List[great_expectations.data_context.types.resource_identifiers.ConfigurationIdentifier]]#

List existing Checkpoint identifiers on this context.

Returns:

Either a list of strings or ConfigurationIdentifiers depending on the environment and context type.

list_datasources() List[dict]#

List the configurations of the datasources associated with this context.

Note that any sensitive values are obfuscated before being returned.

Returns:

A list of dictionaries representing datasource configurations. Each value with contain a "name", "class_name", and "module_name" at a minimum.

list_expectation_suite_names() List[str]#

Lists the available expectation suite names.

Returns:

A list of suite names (sorted in alphabetic order).

run_checkpoint(checkpoint_name: Optional[str] = None, ge_cloud_id: Optional[str] = None, template_name: Optional[str] = None, run_name_template: Optional[str] = None, expectation_suite_name: Optional[str] = None, batch_request: Optional[BatchRequestBase] = None, action_list: Optional[List[dict]] = None, evaluation_parameters: Optional[dict] = None, runtime_configuration: Optional[dict] = None, validations: Optional[List[dict]] = None, profilers: Optional[List[dict]] = None, run_id: Optional[Union[str, int, float]] = None, run_name: Optional[str] = None, run_time: Optional[datetime.datetime] = None, result_format: Optional[str] = None, expectation_suite_ge_cloud_id: Optional[str] = None, **kwargs) CheckpointResult#

Validate using an existing Checkpoint.

Parameters:
  • checkpoint_name – The name of a Checkpoint defined via the CLI or by manually creating a yml file

  • template_name – The name of a Checkpoint template to retrieve from the CheckpointStore

  • run_name_template – The template to use for run_name

  • expectation_suite_name – Expectation suite to be used by Checkpoint run

  • batch_request – Batch request to be used by Checkpoint run

  • action_list – List of actions to be performed by the Checkpoint

  • evaluation_parameters – $parameter_name syntax references to be evaluated at runtime

  • runtime_configuration – Runtime configuration override parameters

  • validations – Validations to be performed by the Checkpoint run

  • profilers – Profilers to be used by the Checkpoint run

  • run_id – The run_id for the validation; if None, a default value will be used

  • run_name – The run_name for the validation; if None, a default value will be used

  • run_time – The date/time of the run

  • result_format – One of several supported formatting directives for expectation validation results

  • ge_cloud_id – Great Expectations Cloud id for the checkpoint

  • expectation_suite_ge_cloud_id – Great Expectations Cloud id for the expectation suite

  • **kwargs – Additional kwargs to pass to the validation operator

Returns:

CheckpointResult

save_expectation_suite(expectation_suite: great_expectations.core.expectation_suite.ExpectationSuite, expectation_suite_name: Optional[str] = None, overwrite_existing: bool = True, include_rendered_content: Optional[bool] = None, **kwargs: Optional[dict]) None#

Save the provided ExpectationSuite into the DataContext using the configured ExpectationStore.

Parameters:
  • expectation_suite – The ExpectationSuite to save.

  • expectation_suite_name – The name of this ExpectationSuite. If no name is provided, the name will be read from the suite.

  • overwrite_existing – Whether to overwrite the suite if it already exists.

  • include_rendered_content – Whether to save the prescriptive rendered content for each expectation.

  • kwargs – Additional parameters, unused

Returns:

None

Raises:

DataContextError – If a suite with the same name exists and overwrite_existing is set to False.

test_yaml_config(yaml_config: str, name: Optional[str] = None, class_name: Optional[str] = None, runtime_environment: Optional[dict] = None, pretty_print: bool = True, return_mode: Literal['instantiated_class', 'report_object'] = 'instantiated_class', shorten_tracebacks: bool = False)#

Convenience method for testing yaml configs.

test_yaml_config is a convenience method for configuring the moving parts of a Great Expectations deployment. It allows you to quickly test out configs for system components, especially Datasources, Checkpoints, and Stores.

For many deployments of Great Expectations, these components (plus Expectations) are the only ones you'll need.

test_yaml_config is mainly intended for use within notebooks and tests.

Relevant Documentation Links
Parameters:
  • yaml_config – A string containing the yaml config to be tested

  • name – Optional name of the component to instantiate

  • class_name – Optional, overridden if provided in the config

  • runtime_environment – Optional override for config items

  • pretty_print – Determines whether to print human-readable output

  • return_mode – Determines what type of object test_yaml_config will return. Valid modes are "instantiated_class" and "report_object"

  • shorten_tracebacks – If true, catch any errors during instantiation and print only the last element of the traceback stack. This can be helpful for rapid iteration on configs in a notebook, because it can remove the need to scroll up and down a lot.

Returns:

The instantiated component (e.g. a Datasource) OR a json object containing metadata from the component's self_check method. The returned object is determined by return_mode.

update_project_config(project_config: DataContextConfig | Mapping) None#

Update the context's config with the values from another config object.

Parameters:

project_config – The config to use to update the context's internal state.