Skip to main content

InferredAssetS3DataConnector

class great_expectations.datasource.data_connector.InferredAssetS3DataConnector(name: str, datasource_name: str, bucket: str, execution_engine: Optional[great_expectations.execution_engine.execution_engine.ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: str = '', delimiter: str = '/', max_keys: int = 1000, boto3_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None, id: Optional[str] = None)#

An Inferred Asset Data Connector used to connect to AWS Simple Storage Service (S3).

This Data Connector uses regular expressions to traverse through S3 buckets and implicitly determine Data Asset name.

This DataConnector supports the following methods of authentication:
  1. Standard gcloud auth / GOOGLE_APPLICATION_CREDENTIALS environment variable workflow

  2. Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_file

  3. Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_info

Much of the interaction is performed using the boto3 S3 client. Please refer to the Official Aws Documentation for more information.

Parameters:
  • name – The name of the Data Connector.

  • datasource_name – The name of this Data Connector’s Datasource.

  • bucket – The S3 bucket name.

  • execution_engine – The Execution Engine object to used by this Data Connector to read the data.

  • default_regex – A regex configuration for filtering data references. The dict can include a regex pattern and a list of group_names for capture groups.

  • sorters – A list of sorters for sorting data references.

  • prefix – Infer as Data Assets only blobs that begin with this prefix.

  • delimiter – When included, will remove any prefix up to the delimiter from the inferred Data Asset names.

  • max_keys – Max blob filepaths to return.

  • boto3_options – Options passed to the S3 client.

  • batch_spec_passthrough – Dictionary with keys that will be added directly to the batch spec.

  • id – The unique identifier for this Data Connector used when running in cloud mode.

get_available_data_asset_names() List[str]#

Return the list of asset names known by this DataConnector

Returns:

A list of available names