InferredAssetS3DataConnector
- class great_expectations.datasource.data_connector.InferredAssetS3DataConnector(name: str, datasource_name: str, bucket: str, execution_engine: Optional[great_expectations.execution_engine.execution_engine.ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: str = '', delimiter: str = '/', max_keys: int = 1000, boto3_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None, id: Optional[str] = None)#
An Inferred Asset Data Connector used to connect to AWS Simple Storage Service (S3).
This Data Connector uses regular expressions to traverse through S3 buckets and implicitly determine Data Asset name.
- This DataConnector supports the following methods of authentication:
Standard gcloud auth / GOOGLE_APPLICATION_CREDENTIALS environment variable workflow
Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_file
Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_info
Much of the interaction is performed using the boto3 S3 client. Please refer to the Official Aws Documentation for more information.
- Parameters:
name – The name of the Data Connector.
datasource_name – The name of this Data Connector’s Datasource.
bucket – The S3 bucket name.
execution_engine – The Execution Engine object to used by this Data Connector to read the data.
default_regex – A regex configuration for filtering data references. The dict can include a regex pattern and a list of group_names for capture groups.
sorters – A list of sorters for sorting data references.
prefix – Infer as Data Assets only blobs that begin with this prefix.
delimiter – When included, will remove any prefix up to the delimiter from the inferred Data Asset names.
max_keys – Max blob filepaths to return.
boto3_options – Options passed to the S3 client.
batch_spec_passthrough – Dictionary with keys that will be added directly to the batch spec.
id – The unique identifier for this Data Connector used when running in cloud mode.
- get_available_data_asset_names() List[str] #
Return the list of asset names known by this DataConnector
- Returns:
A list of available names