Skip to content

Instantly share code, notes, and snippets.

@rjpower
Created March 24, 2026 18:36
Show Gist options
  • Select an option

  • Save rjpower/8b2fde536b2e7ec307bee4da586b4769 to your computer and use it in GitHub Desktop.

Select an option

Save rjpower/8b2fde536b2e7ec307bee4da586b4769 to your computer and use it in GitHub Desktop.
Iris task container GCP credential impersonation: options analysis

GCP Credential Impersonation in Iris Task Containers

Context

Iris runs user code in Docker containers on GCE worker VMs. When per-user credential isolation is enabled, each user's tasks should run as their designated GCP service account rather than the worker VM's native SA. The worker VM's SA has roles/iam.serviceAccountTokenCreator on each target user SA.

The challenge: how do we make all GCP client libraries inside the container — Python (google-cloud-storage, gcsfs, BigQuery client, etc.), gcloud CLI, and potentially Go/Java — use impersonated credentials transparently?

The Core Problem: ADC File Format Limitations

google.auth.default() is the standard entry point for Application Default Credentials (ADC). It checks, in order:

  1. GOOGLE_APPLICATION_CREDENTIALS env var → load from JSON file
  2. ~/.config/gcloud/application_default_credentials.json (user creds from gcloud auth application-default login)
  3. GCE metadata server (on Compute Engine VMs)

We want to intercept at step 1 or 2 with a file that says "impersonate this SA." The impersonated_service_account ADC JSON type exists:

{
  "type": "impersonated_service_account",
  "service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/TARGET_SA:generateAccessToken",
  "source_credentials": { "type": "...", ... },
  "delegates": []
}

But the source_credentials must be one of these JSON-serializable types (source):

Type string Requires in JSON
authorized_user client_id, client_secret, refresh_token
service_account private_key, client_email, etc.
external_account_authorized_user external account federation fields

compute_engine / compute_metadata is NOT a supported source type. On a GCE VM, the metadata server credentials exist only at runtime — there is no JSON file representation. This means we cannot write an impersonated_service_account ADC JSON file that uses the VM's native credentials as the source.

This is the fundamental constraint that shapes all options below.


Options

Option 1: sitecustomize.py monkey-patch + gcloud env var

Mechanism:

  • Set CLOUDSDK_AUTH_IMPERSONATE_SERVICE_ACCOUNT=target-sa@project.iam in the container env → covers gcloud CLI
  • Write a sitecustomize.py to a directory on PYTHONPATH that monkey-patches google.auth.default():
# sitecustomize.py — runs automatically before any Python script
import os
_sa = os.environ.get("IRIS_IMPERSONATE_SERVICE_ACCOUNT", "")
if _sa:
    import google.auth
    import google.auth.impersonated_credentials
    _original_default = google.auth.default

    def _impersonated_default(scopes=None, request=None, **kwargs):
        source_creds, project = _original_default(
            scopes=["https://www.googleapis.com/auth/cloud-platform"])
        target_scopes = scopes or kwargs.get("default_scopes") or \
            ["https://www.googleapis.com/auth/cloud-platform"]
        return google.auth.impersonated_credentials.Credentials(
            source_credentials=source_creds,
            target_principal=_sa,
            target_scopes=target_scopes,
        ), project

    google.auth.default = _impersonated_default

sitecustomize.py runs for all Python invocations (scripts, -c, subprocesses), unlike PYTHONSTARTUP which only runs for interactive sessions. The google.auth.impersonated_credentials.Credentials object auto-refreshes tokens internally via the IAM Credentials API — no external refresh loop needed.

Pros Cons
No token refresh loop needed (auto-refreshes) Only covers Python + gcloud CLI
No GCP infrastructure setup Monkey-patching is fragile if google.auth.default signature changes
No security-sensitive key material Does not cover Go, Java, or other non-Python GCP clients
Simple implementation (~30 lines) sitecustomize.py may conflict with other site customizations

Coverage: Python libraries ✅ | gcloud CLI ✅ | Go/Java ❌ | gsutil ✅ (Python-based)


Option 2: Worker-refreshed access token file + GOOGLE_APPLICATION_CREDENTIALS

Mechanism:

  • Worker calls generateAccessToken for the target SA via the IAM Credentials API
  • Writes the token to a file mounted into the container
  • Sets GOOGLE_APPLICATION_CREDENTIALS pointing to that file
  • Worker's _monitor() loop refreshes the file every ~30 minutes

Problem: There is no standard ADC JSON format for "here's a raw access token, use it." The supported file types (authorized_user, service_account, external_account, etc.) all contain refresh credentials, not bare tokens. Writing a raw access token to a file that google.auth.load_credentials_from_file() can parse is not possible without a custom credential type.

You could work around this by:

Pros Cons
Token file approach is language-agnostic in theory No standard ADC format for bare access tokens
Worker controls token lifecycle Requires custom credential loading or WIF infrastructure
Refresh loop adds complexity to worker
Race condition: task reads file while worker writes

Coverage: Depends on implementation — not viable as-is without custom code or WIF.


Option 3: Temporary SA key via iam.serviceAccountKeys.create

Mechanism:

This produces a standard service_account JSON key file that every GCP client library understands natively.

Pros Cons
Universal coverage — all languages, all libraries Creates a real private key (security risk)
Standard GOOGLE_APPLICATION_CREDENTIALS mechanism Requires iam.serviceAccountKeyAdmin on worker SA
No refresh loop (keys don't expire until deleted) Key must be reliably cleaned up (leak = persistent credential)
No monkey-patching or custom code in container GCP quotas: max 10 keys per SA
Org policies may block SA key creation

Coverage: Python libraries ✅ | gcloud CLI ✅ | Go/Java ✅ | Everything ✅


Option 4: Workload Identity Federation with file-based token source

Mechanism:

{
  "type": "external_account",
  "audience": "//iam.googleapis.com/projects/PROJECT_NUM/locations/global/workloadIdentityPools/POOL/providers/PROVIDER",
  "subject_token_type": "urn:ietf:params:oauth:token-type:access_token",
  "token_url": "https://sts.googleapis.com/v1/token",
  "credential_source": { "file": "/iris/token" },
  "service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/TARGET_SA:generateAccessToken"
}
  • Worker writes a fresh access token to /iris/token periodically
  • google.auth.default() reads the config, exchanges the token via STS, then impersonates the target SA
Pros Cons
Standard ADC mechanism, no monkey-patching Requires GCP infrastructure: identity pool + provider
Universal coverage (all languages) Refresh loop for token file
No private keys created Complex setup and debugging
Google-supported pattern STS token exchange adds latency to first auth

Coverage: Python libraries ✅ | gcloud CLI ✅ | Go/Java ✅ | Everything ✅


Comparison Matrix

Python libs gcloud Go/Java Refresh GCP infra needed Complexity Security
Opt 1: sitecustomize ✅ (env var) Auto (in-process) None Low Good (no keys)
Opt 2: Token file ❌ (no ADC format) Worker loop None Medium Good (short-lived)
Opt 3: Temp SA key None (key-based) serviceAccountKeyAdmin Medium ⚠️ Key material
Opt 4: WIF Worker loop Pool + provider High Good (federated)

Recommendation

Option 1 (sitecustomize.py + CLOUDSDK_AUTH_IMPERSONATE_SERVICE_ACCOUNT) for now:

  • Marin tasks are exclusively Python — Go/Java coverage is not needed today
  • Zero GCP infrastructure setup
  • No security-sensitive key material
  • Auto-refreshing credentials (no worker refresh loop)
  • Simple to implement and reason about

If universal language coverage becomes necessary later, Option 3 (temporary SA key) is the pragmatic upgrade — it's the only option that covers everything without infrastructure setup. Option 4 (WIF) is the proper long-term solution if the org wants to avoid SA keys entirely, but requires upfront GCP configuration.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment