Skip to main content

InferredAssetAWSGlueDataCatalogDataConnector

class great_expectations.datasource.data_connector.InferredAssetAWSGlueDataCatalogDataConnector(name: str, datasource_name: str, execution_engine: Optional[great_expectations.execution_engine.execution_engine.ExecutionEngine] = None, catalog_id: Optional[str] = None, data_asset_name_prefix: str = '', data_asset_name_suffix: str = '', excluded_tables: Optional[list] = None, included_tables: Optional[list] = None, glue_introspection_directives: Optional[dict] = None, boto3_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None, id: Optional[str] = None)#

An Inferred Asset Data Connector used to connect to data through an AWS Glue Data Catalog.

This Data Connector operates on AWS Glue Data Catalog and determines the Data Asset name implicitly, by listing all databases, tables, and partitions from AWS Glue Data Catalog.

Parameters:
  • name – The name of the Data Connector.

  • datasource_name – The name of this Data Connector’s Datasource.

  • execution_engine – The Execution Engine object to used by this Data Connector to read the data.

  • catalog_id – The catalog ID from which to retrieve data. If none is provided, the AWS account ID is used by default. Make sure you use the same catalog ID as configured in your spark session.

  • data_asset_name_prefix – A prefix to prepend to all names of Data Assets inferred by this Data Connector.

  • data_asset_name_suffix – A suffix to append to all names of Data Asset inferred by this Data Connector.

  • excluded_tables – A list of tables, in the form ([database].[table]), to ignore when inferring Data Asset names.

  • included_tables – A list of tables, in the form ([database].[table]), to include when inferring Data Asset names. When provided, only Data Assets matching this list will be inferred.

  • glue_introspection_directives – Arguments passed to the introspection method. Currently, the only available directive isdatabase which filters to assets only in this database.

  • boto3_options – Options passed to the boto3 library.

  • batch_spec_passthrough – Dictionary with keys that will be added directly to the batch spec.

  • id – The unique identifier for this Data Connector used when running in cloud mode.

get_available_data_asset_names() List[str]#

Return the list of asset names known by this DataConnector.

Returns:

A list of available names