Skip to main content

InferredAssetGCSDataConnector

class great_expectations.datasource.data_connector.InferredAssetGCSDataConnector(name: str, datasource_name: str, bucket_or_name: str, execution_engine: Optional[great_expectations.execution_engine.execution_engine.ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, max_results: Optional[int] = None, gcs_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None, id: Optional[str] = None)#

An Inferred Asset Data Connector used to connect to Google Cloud Storage (GCS).

This Data Connector uses regular expressions to traverse through GCS buckets and implicitly determine Data Asset name. Please note that in order to maintain consistency with Google’s official SDK, we utilize parameter names bucket_or_name and max_results. Since we convert these keys from YAML to Python and directly pass them in to the GCS connection object, maintaining consistency is necessary for proper usage.

This DataConnector supports the following methods of authentication:
  1. Standard gcloud auth / GOOGLE_APPLICATION_CREDENTIALS environment variable workflow

  2. Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_file

  3. Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_info

Much of the interaction is performed using a GCS Storage Client. Please refer to the Official Google Documentation for more information.

Parameters:
  • name – The name of the Data Connector.

  • datasource_name – The name of this Data Connector’s Datasource.

  • bucket_or_name – Bucket name for Google Cloud Storage.

  • execution_engine – The Execution Engine object to used by this Data Connector to read the data.

  • default_regex – A regex configuration for filtering data references. The dict can include a regex pattern and a list of group_names for capture groups.

  • sorters – A list of sorters for sorting data references.

  • prefix – Infer as Data Assets only blobs that begin with this prefix.

  • delimiter – When included, will remove any prefix up to the delimiter from the inferred Data Asset names.

  • max_results – Max blob filepaths to return.

  • gcs_options – Options passed to the GCS Storage Client.

  • batch_spec_passthrough – Dictionary with keys that will be added directly to the batch spec.

  • id – The unique identifier for this Data Connector used when running in cloud mode.

get_available_data_asset_names() List[str]#

Return the list of asset names known by this DataConnector

Returns:

A list of available names