Skip to main content

ConfiguredAssetGCSDataConnector

class great_expectations.datasource.data_connector.ConfiguredAssetGCSDataConnector(name: str, datasource_name: str, bucket_or_name: str, assets: dict, execution_engine: Optional[great_expectations.execution_engine.execution_engine.ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, max_results: Optional[int] = None, gcs_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None, id: Optional[str] = None)#

Extension of ConfiguredAssetFilePathDataConnector used to connect to GCS.

A ConfiguredAssetGCSDataConnector requires an explicit specification of each DataAsset you want to connect to. This allows more fine-tuning, but also requires more setup. Please note that in order to maintain consistency with Google’s official SDK, we utilize terms like “bucket_or_name” and “max_results”. Since we convert these keys from YAML to Python and directly pass them in to the GCS connection object, maintaining consistency is necessary for proper usage.

This DataConnector supports the following methods of authentication:
  1. Standard gcloud auth / GOOGLE_APPLICATION_CREDENTIALS environment variable workflow

  2. Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_file

  3. Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_info

Parameters:
  • name (str) – required name for DataConnector

  • datasource_name (str) – required name for datasource

  • bucket_or_name (str) – bucket name for Google Cloud Storage

  • assets (dict) – dict of asset configuration (required for ConfiguredAssetDataConnector)

  • execution_engine (ExecutionEngine) – optional reference to ExecutionEngine

  • default_regex (dict) – optional regex configuration for filtering data_references

  • sorters (list) – optional list of sorters for sorting data_references

  • prefix (str) – GCS prefix

  • delimiter (str) – GCS delimiter

  • max_results (int) – max blob filepaths to return

  • gcs_options (dict) – wrapper object for optional GCS **kwargs

  • batch_spec_passthrough (dict) – dictionary with keys that will be added directly to batch_spec

get_available_data_asset_names() List[str]#

Return the list of asset names known by this DataConnector.

Returns:

A list of available names