Skip to content

Instantly share code, notes, and snippets.

@cgwalters
Last active May 13, 2026 12:03
Show Gist options
  • Select an option

  • Save cgwalters/4f2e641ccb0c42361a89f71d17ede04f to your computer and use it in GitHub Desktop.

Select an option

Save cgwalters/4f2e641ccb0c42361a89f71d17ede04f to your computer and use it in GitHub Desktop.
Simplifying OpenShell: Structural Network Isolation via Proxy Sidecar

Simplifying OpenShell: Structural Network Isolation via Proxy Sidecar

(This document has heavy AI generation, but is a result of a fair bit of interactive design/research work with cgwalters, plus of course big tip to paude and other projects which already blazed this trail around network proxying)

Motivation

OpenShell's current inner sandboxing — Landlock, seccomp, network namespaces, iptables, and TOFU binary verification, all running inside the sandbox container — is simultaneously too much and too little.

Too restrictive for development agents

Development agents need to install packages (dnf, apt), experiment with tools, evolve their own environments, and sometimes run nested containers (developing OpenShell in OpenShell). Inner sandboxing breaks all of this. Landlock blocks package manager writes. The custom seccomp profile breaks nested containerization. These are fundamental limitations for the primary use case.

Redundant with container runtimes

Because the current "inner" sandboxing needs higher privileges in order to reduce privileges for the inner agent, it creates duplication with the base configuration already accessible by the container runtime itself!

  • Landlock overlaps with filesystem isolation configurable in the outer container (podman run --read-only); of course, people who want to use Landlock can continue to do so.
  • Seccomp is already configurable at the container level, and there's well understood tooling for managing that.
  • Network namespaces + iptables create a nested namespace inside the container's own namespace just to force traffic through the proxy. This is why the supervisor needs root and CAP_NET_ADMIN.
  • TOFU binary verification resolves which binary is making each connection via /proc/net/tcp and verifies its hash. But if an interpreter is trusted (and agents will use interpreters), it verifies the interpreter binary, not the code being interpreted. The security value is marginal; the complexity cost (CAP_SYS_PTRACE, /proc walking, root) is high.

The fix: structural network isolation, reuse the container runtime

Move the L7 proxy out of the container. Use the container runtime's own network isolation — a Podman --internal network with no default gateway — as the enforcement boundary. The proxy sidecar sits on both the internal network and the bridge; it's the only route out. The agent can unset HTTP_PROXY all it wants — there's no route to the internet.

Inner sandboxing (Landlock, seccomp) remains available as opt-in defense-in-depth for use cases that need it, but it's no longer required for network isolation. The simpler architecture becomes the default for development agents.


Architecture

Each sandbox becomes three Podman resources:

Per sandbox:
┌─ openshell-sbx-{id} network (--internal, dns_enabled=true) ──┐
│                                                                │
│  ┌─────────────────────┐      ┌──────────────────────────────┐ │
│  │  proxy sidecar       │      │  agent container              │ │
│  │  (openshell-sandbox  │      │  (user image)                 │ │
│  │   OPENSHELL_MODE=    │◀─────│  HTTP_PROXY=proxy:3128        │ │
│  │   proxy)             │      │                               │ │
│  │  - L7 proxy :3128    │ TCP  │  openshell-sandbox            │ │
│  │  - OPA engine        │ fwd  │  - SSH/exec relay             │ │
│  │  - cred injection    │──────│  - gRPC → proxy → gateway     │ │
│  │  - inference routing │:8081 │  - opt-in Landlock/seccomp    │ │
│  │  - TCP fwd gw:8081   │      │  - no network enforcement     │ │
│  │  - singleton MITM CA │      │  - MITM CA via volume (ro)    │ │
│  │  - --dns=host resolv │      │  - GIT_SSH_COMMAND disabled   │ │
│  └─────────────────────┘      └──────────────────────────────┘ │
│        │                                                       │
└────────┼───────────────────────────────────────────────────────┘
         │ also on: openshell bridge network
         ▼
    gateway (host or container, :8081)
    - sandbox lifecycle via Podman API
    - gRPC for CLI and supervisor callbacks

The proxy sidecar runs openshell-sandbox in a new OPENSHELL_MODE=proxy. It handles the L7 proxy, OPA policy, credential injection, and inference routing. It also TCP-forwards the gateway's gRPC port so the agent's supervisor can relay SSH/exec sessions.

The agent container runs openshell-sandbox with OPENSHELL_PROXY_MODE=sidecar, which tells the supervisor to skip inner network namespace and iptables setup (the topology handles that) while still honoring Landlock/seccomp policy if requested. The agent is connected only to the --internal network — no route to the internet.

This mirrors the split-pod model from #981: proxy sidecar = supervisor pod; agent container = agent pod; per-sandbox --internal network = NetworkPolicy.

What this changes

  • Network isolation moves to the topology. The --internal Podman network with no default gateway replaces the inner network namespace + iptables approach. The proxy sidecar is the sole egress path.
  • Binary identity (TOFU) is removed. /proc/net/tcp scanning, BinaryIdentityCache, CAP_SYS_PTRACE — fundamentally insecure with LD_PRELOAD and interpreted languages, and can't work across containers.
  • Proxy moves to sidecar. L7 proxy, OPA engine, credential injection, and inference routing run in a separate container.
  • Per-sandbox MITM CAs → singleton. A singleton CA per gateway session is sufficient.

What this preserves

  • Inner sandboxing (opt-in). Landlock and seccomp remain available as policy-driven defense-in-depth. They're no longer required for network isolation, but users can opt in for filesystem and syscall restrictions.
  • L7 proxy enforcement. OPA policy evaluation, credential injection, inference routing — unchanged in behavior, just moved to the sidecar.
  • SSH/exec relay. The agent still runs openshell-sandbox for SSH/exec. The gRPC connection routes through the proxy sidecar's TCP port-forward.
  • Per-sandbox isolation. Each sandbox gets its own internal network. Sandboxes cannot see each other.
  • Docker and Kubernetes drivers. Unaffected. This is Podman-only.

Design Decisions

Decision Choice Rationale
Isolation model Per-sandbox --internal network Sandboxes can't see each other. Mirrors k8s NetworkPolicy from #981.
Proxy location Per-sandbox sidecar container Mirrors #981 split-pod. Each sandbox gets its own proxy process.
Proxy binary Reuse openshell-sandbox with OPENSHELL_MODE=proxy Supervisor image already has the binary. No new image needed.
Agent supervisor OPENSHELL_PROXY_MODE=sidecar Skips inner netns/iptables. Still applies Landlock/seccomp per policy. Still runs SSH/exec.
Agent-to-gateway gRPC TCP port-forward through proxy sidecar Agent supervisor connects to proxy's internal IP as if it were the gateway.
Binary identity (TOFU) Remove entirely Insecure with LD_PRELOAD; can't work across containers.
MITM CA Singleton per gateway session No security benefit to per-sandbox CA. Shared via volume.
DNS on internal network dns_enabled: true Container-name resolution for debugging. External DNS irrelevant — proxy-aware tools send hostnames via CONNECT.
Proxy DNS --dns set to host resolvers Discovered at driver startup from /run/systemd/resolve/resolv.conf, filtering 127.x and 169.254.x. Must be set at container creation time.
Container IPs Fixed/static per sandbox Proxy=.2, agent=.3. Multi-network requires podman network connect.
Bridge network Existing openshell bridge Proxy sidecars connect here for internet. Dedicated bridge, not default podman.
Git-over-SSH Disabled via GIT_SSH_COMMAND Prevents exfiltration. Forces git over HTTPS through proxy.
Proxy readiness Sentinel file on shared volume Proxy writes .ready after CA and listeners are up. Driver polls before starting agent.
Podman client Keep hand-rolled client Extend with new methods. Bollard migration is future work.

Prior Art

paude

paude validates the core architecture with a working implementation for Claude Code, Gemini CLI, Cursor, and OpenClaw.

  • Per-session --internal network + proxy sidecar on both networks. No inner sandboxing at all.
  • Sentinel credentials: agent sees ANTHROPIC_API_KEY=paude-proxy-managed; proxy swaps for real key in-flight. Equivalent to OpenShell's SecretResolver.
  • dnsmasq in the proxy sidecar for DNS. Our testing showed this isn't strictly needed — most proxy-aware tools send hostnames via CONNECT.
  • Fixed IPs (proxy=.2, agent=.3). Validated.
  • CA injected via exec + cat. Our shared-volume approach is cleaner.
  • Readiness protocol: poll for CA cert before starting agent. Adopted.

alcove

alcove uses a similar dual-network sidecar pattern (Go-based).

  • DNS strategy: "proxy resolves." No DNS forwarder — relies on proxy-aware tools sending hostnames in CONNECT. Validated: curl with HTTP_PROXY does not resolve DNS locally.
  • dns_enabled: true on internal network for container-name resolution.
  • SSH disable trick: GIT_SSH_COMMAND="echo 'SSH disabled' && exit 1". Adopted.
  • Comprehensive CA trust env vars: SSL_CERT_FILE, NODE_EXTRA_CA_CERTS, CURL_CA_BUNDLE, GIT_SSL_CAINFO, REQUESTS_CA_BUNDLE. Adopted.
  • Shared internal network (all sandboxes on one network). We use per-sandbox networks for stronger isolation.

Validated Podman behavior (Podman 5.8.1)

Behavior Result
Aardvark-dns resolves external names on --internal No — NXDOMAIN
dns_enabled: true internal network Container names only, not external
Static IPs via --network name:ip=x.x.x.x Works
Multi-network via comma syntax with static IPs Broken — only first network gets IP
Multi-network via podman network connect Works
Proxy on both networks Default route via bridge, internet confirmed
Agent on internal-only Cannot reach internet (Network unreachable)
curl with HTTP_PROXY, no local DNS Works — sends CONNECT with hostname
wget with HTTP_PROXY, no local DNS Fails — resolves locally first
resolv.conf updated by podman network connect No — set at creation time only

Implementation

Phase 0: Remove binary identity (TOFU)

Prep commit. Remove /proc/net/tcp identity scanning, BinaryIdentityCache, and all OPA policy fields that depend on binary_path/binary_sha256. The OPA TCP input simplifies to:

{ "host": "api.anthropic.com", "port": 443 }

L7 input keeps the request fields but drops exec:

{
    "network": { "host": "...", "port": 443 },
    "request": { "method": "GET", "path": "/v1/chat", "query_params": {} }
}

This is a standalone change — it can land independently and unblocks everything else. Key code to remove: BinaryIdentityCache, find_socket_inode_owners, parse_proc_net_tcp, file_sha256, the entrypoint_pid: Arc<AtomicU32> parameter on ProxyHandle::start_with_bind_addr, and the CAP_SYS_PTRACE requirement.

Phase 1: Podman client extensions

Add to the hand-rolled Podman client in crates/openshell-driver-podman/src/client.rs:

// Create an --internal bridge network. Idempotent.
pub async fn ensure_internal_network(&self, name: &str) -> Result<(), PodmanApiError>;

// Inspect container to get its IP on a specific network.
pub async fn container_ip(&self, container: &str, network: &str) -> Result<Option<String>, PodmanApiError>;

// Connect a running container to an additional network.
pub async fn network_connect(&self, network: &str, container: &str) -> Result<(), PodmanApiError>;

// Get the /24 subnet base (e.g., "10.89.5") for fixed IP derivation.
pub async fn network_subnet_base(&self, name: &str) -> Result<String, PodmanApiError>;

// Remove a network. Idempotent.
pub async fn remove_network(&self, name: &str) -> Result<(), PodmanApiError>;

Also extend ContainerSpec with static_ips (per-network) and dns_server (for --dns).

Phase 2: Proxy-only mode (OPENSHELL_MODE=proxy)

New mode for openshell-sandbox — the sidecar entry point. Runs in crates/openshell-sandbox/src/proxy_mode.rs.

pub async fn run_proxy_mode(args: SandboxArgs) -> miette::Result<()> {
    // 1. Connect to gateway, fetch policy, build OPA engine
    // 2. Fetch provider env, build SecretResolver
    // 3. Load existing CA from /openshell-tls/ or generate new one
    //    (persist so restarts don't invalidate cached CA in agent)
    // 4. Build inference context
    // 5. Start L7 proxy on :3128
    // 6. Start TCP port-forward on :8081 → gateway gRPC
    // 7. Spawn policy poll loop + inference route poller
    // 8. Write /openshell-tls/.ready (only after BOTH listeners are bound)
    // 9. Wait for shutdown
}

The TCP forwarder is a simple tokio relay. One subtlety: OPENSHELL_ENDPOINT is an HTTP URL, but TcpStream::connect needs host:port:

async fn run_tcp_forward(listen_addr: SocketAddr, upstream_url: &str) -> miette::Result<()> {
    let url: url::Url = upstream_url.parse()?;
    let upstream = format!("{}:{}", url.host_str().unwrap(), url.port_or_known_default().unwrap());
    let listener = TcpListener::bind(listen_addr).await?;
    loop {
        let (client, _) = listener.accept().await?;
        let upstream = upstream.clone();
        tokio::spawn(async move {
            if let Ok(server) = TcpStream::connect(&upstream).await {
                let (mut cr, mut cw) = client.into_split();
                let (mut sr, mut sw) = server.into_split();
                tokio::select! {
                    _ = tokio::io::copy(&mut cr, &mut sw) => {},
                    _ = tokio::io::copy(&mut sr, &mut cw) => {},
                }
            }
        });
    }
}

Phase 3: Sidecar-aware agent mode (OPENSHELL_PROXY_MODE=sidecar)

When OPENSHELL_PROXY_MODE=sidecar is set in lib.rs, the supervisor:

  • Skips inner network namespace, iptables, and proxy startup (the topology and sidecar handle these)
  • Still applies Landlock and seccomp if the sandbox policy requests them
  • Still starts SSH server and runs the workload

This is distinct from OPENSHELL_MODE=nested which disables ALL enforcement. The sidecar mode only disables network-related enforcement.

One gotcha: apply_child_env in ssh.rs calls env_clear() before setting up child processes. The proxy env vars (HTTP_PROXY, SSL_CERT_FILE, GIT_SSH_COMMAND, etc.) set via the container spec will be dropped from SSH/exec sessions. The fix is to propagate a specific set of env vars from the supervisor's own environment into children.

Phase 4: Podman driver — per-sandbox sidecar architecture

The sandbox creation sequence in crates/openshell-driver-podman/src/driver.rs:

async fn create_sandbox(&self, sandbox: &DriverSandbox) -> Result<...> {
    let internal_net = format!("openshell-sbx-{}", sandbox.id);

    // 1. Per-sandbox internal network
    self.client.ensure_internal_network(&internal_net).await?;

    // 2. Fixed IPs from subnet
    let base = self.client.network_subnet_base(&internal_net).await?;
    let proxy_ip = format!("{base}.2");
    let agent_ip = format!("{base}.3");

    // 3. Shared TLS volume (proxy writes CA + .ready; agent reads)
    let tls_vol = format!("openshell-tls-{}", sandbox.id);
    self.client.create_volume(&tls_vol).await?;

    // 4. Proxy sidecar: internal network + fixed IP + host DNS resolvers
    //    (resolv.conf is set at creation time; podman network connect won't update it)
    let proxy_name = format!("openshell-proxy-{}", sandbox.name);
    self.client.create_container(&build_proxy_sidecar_spec(...)).await?;

    // 5. Connect proxy to bridge (gives it internet access via default route)
    self.client.network_connect(&self.config.network_name, &proxy_name).await?;
    self.client.start_container(&proxy_name).await?;

    // 6. Wait for .ready sentinel
    self.wait_for_proxy_ready(&proxy_name, Duration::from_secs(30)).await?;

    // 7. Agent: internal-only, fixed IP, OPENSHELL_PROXY_MODE=sidecar
    self.client.create_container(&build_agent_container_spec(...)).await?;
    self.client.start_container(&agent_name).await?;
    Ok(...)
}

Deletion is the reverse: agent → proxy → volume → network.

The proxy sidecar container is essentially:

  • Image: supervisor image (already has the openshell-sandbox binary)
  • Entrypoint: /openshell-sandbox with OPENSHELL_MODE=proxy
  • Networks: internal only at creation (bridge added via network_connect)
  • DNS: explicit --dns to host resolvers (discovered from /run/systemd/resolve/resolv.conf, filtering 127.x and 169.254.x)
  • Volume: TLS volume mounted rw

The agent container:

  • Image: user-specified sandbox image
  • Entrypoint: sideloaded openshell-sandbox via image volumes
  • Networks: internal only (no bridge, no internet route)
  • OPENSHELL_PROXY_MODE=sidecar + all the proxy/CA env vars
  • GIT_SSH_COMMAND disabled (prevents git-over-SSH exfiltration)
  • Volume: TLS volume mounted ro
  • Capabilities: ALL dropped; SYS_ADMIN added only if policy requests Landlock
Agent container env:
  OPENSHELL_PROXY_MODE=sidecar
  HTTP_PROXY=http://{proxy_ip}:3128
  HTTPS_PROXY=http://{proxy_ip}:3128
  OPENSHELL_ENDPOINT=http://{proxy_ip}:8081
  SSL_CERT_FILE=/openshell-tls/ca.pem
  NODE_EXTRA_CA_CERTS=/openshell-tls/ca.pem
  CURL_CA_BUNDLE=/openshell-tls/ca.pem
  GIT_SSL_CAINFO=/openshell-tls/ca.pem
  REQUESTS_CA_BUNDLE=/openshell-tls/ca.pem
  GIT_SSH_COMMAND=echo 'SSH disabled — use HTTPS' && exit 1

Other driver changes:

  • Add openshell.role=proxy/openshell.role=agent labels. Filter list_sandboxes on role=agent to avoid returning duplicates.
  • Monitor container events for cleanup of orphaned resources.
  • Remove build_container_spec_supervised and build_container_spec_passthrough.

Phase 5: Gateway script and integration tests

Update gateway-podman.sh. Test the full flow:

mise run gateway:podman
openshell sandbox create --name test

# Verify 3 resources
podman network ls | grep openshell-sbx
podman ps | grep openshell-proxy-test
podman ps | grep openshell-agent-test

# Proxy works
openshell sandbox exec test -- curl -s https://api.github.com/zen

# Bypass fails
openshell sandbox exec test -- bash -c 'unset HTTP_PROXY HTTPS_PROXY; curl --connect-timeout 5 https://api.github.com/zen'
# → Network unreachable

# SSH works
openshell sandbox connect test

# Git-over-SSH blocked
openshell sandbox exec test -- git clone git@github.com:test/repo.git
# → SSH disabled — use HTTPS

# Cleanup
openshell sandbox delete test
# No orphaned networks, containers, or volumes

Open Questions

  1. wget and non-proxy-aware tools. wget resolves DNS locally before connecting to the proxy, so it fails on the internal network. Most development tools (curl, python, node, git) work fine. Known limitation for now; DNS forwarder in the sidecar is future work if needed.

  2. Proxy code sharing. Both proxy-only mode and inner proxy mode use the same code in openshell-sandbox. A future refactor could extract shared logic into openshell-proxy-core.

  3. Split OCSF logs. Network/L7 events now come from the proxy sidecar; SSH/process events from the agent supervisor. Two separate log streams. Document as a known change; merging is future work.

  4. Proxy readiness: poll vs. healthcheck. Starting with podman exec polling for .ready. Podman healthcheck is an alternative if polling proves unreliable.

@cgwalters

Copy link
Copy Markdown
Author

I generated some commits on https://github.com/cgwalters/OpenShell/commits/main/ for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment