(This document has heavy AI generation, but is a result of a fair bit of interactive design/research work with cgwalters, plus of course big tip to paude and other projects which already blazed this trail around network proxying)
OpenShell's current inner sandboxing — Landlock, seccomp, network namespaces, iptables, and TOFU binary verification, all running inside the sandbox container — is simultaneously too much and too little.
Development agents need to install packages (dnf, apt), experiment with tools, evolve their own environments, and sometimes run nested containers (developing OpenShell in OpenShell). Inner sandboxing breaks all of this. Landlock blocks package manager writes. The custom seccomp profile breaks nested containerization. These are fundamental limitations for the primary use case.
Because the current "inner" sandboxing needs higher privileges in order to reduce privileges for the inner agent, it creates duplication with the base configuration already accessible by the container runtime itself!
- Landlock overlaps with filesystem isolation configurable in the outer container (
podman run --read-only); of course, people who want to use Landlock can continue to do so. - Seccomp is already configurable at the container level, and there's well understood tooling for managing that.
- Network namespaces + iptables create a nested namespace inside the container's own namespace just to force traffic through the proxy. This is why the supervisor needs root and
CAP_NET_ADMIN. - TOFU binary verification resolves which binary is making each connection via
/proc/net/tcpand verifies its hash. But if an interpreter is trusted (and agents will use interpreters), it verifies the interpreter binary, not the code being interpreted. The security value is marginal; the complexity cost (CAP_SYS_PTRACE,/procwalking, root) is high.
Move the L7 proxy out of the container. Use the container runtime's own network isolation — a Podman --internal network with no default gateway — as the enforcement boundary. The proxy sidecar sits on both the internal network and the bridge; it's the only route out. The agent can unset HTTP_PROXY all it wants — there's no route to the internet.
Inner sandboxing (Landlock, seccomp) remains available as opt-in defense-in-depth for use cases that need it, but it's no longer required for network isolation. The simpler architecture becomes the default for development agents.
Each sandbox becomes three Podman resources:
Per sandbox:
┌─ openshell-sbx-{id} network (--internal, dns_enabled=true) ──┐
│ │
│ ┌─────────────────────┐ ┌──────────────────────────────┐ │
│ │ proxy sidecar │ │ agent container │ │
│ │ (openshell-sandbox │ │ (user image) │ │
│ │ OPENSHELL_MODE= │◀─────│ HTTP_PROXY=proxy:3128 │ │
│ │ proxy) │ │ │ │
│ │ - L7 proxy :3128 │ TCP │ openshell-sandbox │ │
│ │ - OPA engine │ fwd │ - SSH/exec relay │ │
│ │ - cred injection │──────│ - gRPC → proxy → gateway │ │
│ │ - inference routing │:8081 │ - opt-in Landlock/seccomp │ │
│ │ - TCP fwd gw:8081 │ │ - no network enforcement │ │
│ │ - singleton MITM CA │ │ - MITM CA via volume (ro) │ │
│ │ - --dns=host resolv │ │ - GIT_SSH_COMMAND disabled │ │
│ └─────────────────────┘ └──────────────────────────────┘ │
│ │ │
└────────┼───────────────────────────────────────────────────────┘
│ also on: openshell bridge network
▼
gateway (host or container, :8081)
- sandbox lifecycle via Podman API
- gRPC for CLI and supervisor callbacks
The proxy sidecar runs openshell-sandbox in a new OPENSHELL_MODE=proxy. It handles the L7 proxy, OPA policy, credential injection, and inference routing. It also TCP-forwards the gateway's gRPC port so the agent's supervisor can relay SSH/exec sessions.
The agent container runs openshell-sandbox with OPENSHELL_PROXY_MODE=sidecar, which tells the supervisor to skip inner network namespace and iptables setup (the topology handles that) while still honoring Landlock/seccomp policy if requested. The agent is connected only to the --internal network — no route to the internet.
This mirrors the split-pod model from #981: proxy sidecar = supervisor pod; agent container = agent pod; per-sandbox --internal network = NetworkPolicy.
- Network isolation moves to the topology. The
--internalPodman network with no default gateway replaces the inner network namespace + iptables approach. The proxy sidecar is the sole egress path. - Binary identity (TOFU) is removed.
/proc/net/tcpscanning,BinaryIdentityCache,CAP_SYS_PTRACE— fundamentally insecure with LD_PRELOAD and interpreted languages, and can't work across containers. - Proxy moves to sidecar. L7 proxy, OPA engine, credential injection, and inference routing run in a separate container.
- Per-sandbox MITM CAs → singleton. A singleton CA per gateway session is sufficient.
- Inner sandboxing (opt-in). Landlock and seccomp remain available as policy-driven defense-in-depth. They're no longer required for network isolation, but users can opt in for filesystem and syscall restrictions.
- L7 proxy enforcement. OPA policy evaluation, credential injection, inference routing — unchanged in behavior, just moved to the sidecar.
- SSH/exec relay. The agent still runs
openshell-sandboxfor SSH/exec. The gRPC connection routes through the proxy sidecar's TCP port-forward. - Per-sandbox isolation. Each sandbox gets its own internal network. Sandboxes cannot see each other.
- Docker and Kubernetes drivers. Unaffected. This is Podman-only.
| Decision | Choice | Rationale |
|---|---|---|
| Isolation model | Per-sandbox --internal network |
Sandboxes can't see each other. Mirrors k8s NetworkPolicy from #981. |
| Proxy location | Per-sandbox sidecar container | Mirrors #981 split-pod. Each sandbox gets its own proxy process. |
| Proxy binary | Reuse openshell-sandbox with OPENSHELL_MODE=proxy |
Supervisor image already has the binary. No new image needed. |
| Agent supervisor | OPENSHELL_PROXY_MODE=sidecar |
Skips inner netns/iptables. Still applies Landlock/seccomp per policy. Still runs SSH/exec. |
| Agent-to-gateway gRPC | TCP port-forward through proxy sidecar | Agent supervisor connects to proxy's internal IP as if it were the gateway. |
| Binary identity (TOFU) | Remove entirely | Insecure with LD_PRELOAD; can't work across containers. |
| MITM CA | Singleton per gateway session | No security benefit to per-sandbox CA. Shared via volume. |
| DNS on internal network | dns_enabled: true |
Container-name resolution for debugging. External DNS irrelevant — proxy-aware tools send hostnames via CONNECT. |
| Proxy DNS | --dns set to host resolvers |
Discovered at driver startup from /run/systemd/resolve/resolv.conf, filtering 127.x and 169.254.x. Must be set at container creation time. |
| Container IPs | Fixed/static per sandbox | Proxy=.2, agent=.3. Multi-network requires podman network connect. |
| Bridge network | Existing openshell bridge |
Proxy sidecars connect here for internet. Dedicated bridge, not default podman. |
| Git-over-SSH | Disabled via GIT_SSH_COMMAND |
Prevents exfiltration. Forces git over HTTPS through proxy. |
| Proxy readiness | Sentinel file on shared volume | Proxy writes .ready after CA and listeners are up. Driver polls before starting agent. |
| Podman client | Keep hand-rolled client | Extend with new methods. Bollard migration is future work. |
paude validates the core architecture with a working implementation for Claude Code, Gemini CLI, Cursor, and OpenClaw.
- Per-session
--internalnetwork + proxy sidecar on both networks. No inner sandboxing at all. - Sentinel credentials: agent sees
ANTHROPIC_API_KEY=paude-proxy-managed; proxy swaps for real key in-flight. Equivalent to OpenShell'sSecretResolver. dnsmasqin the proxy sidecar for DNS. Our testing showed this isn't strictly needed — most proxy-aware tools send hostnames via CONNECT.- Fixed IPs (proxy=
.2, agent=.3). Validated. - CA injected via
exec + cat. Our shared-volume approach is cleaner. - Readiness protocol: poll for CA cert before starting agent. Adopted.
alcove uses a similar dual-network sidecar pattern (Go-based).
- DNS strategy: "proxy resolves." No DNS forwarder — relies on proxy-aware tools sending hostnames in CONNECT. Validated:
curlwithHTTP_PROXYdoes not resolve DNS locally. dns_enabled: trueon internal network for container-name resolution.- SSH disable trick:
GIT_SSH_COMMAND="echo 'SSH disabled' && exit 1". Adopted. - Comprehensive CA trust env vars:
SSL_CERT_FILE,NODE_EXTRA_CA_CERTS,CURL_CA_BUNDLE,GIT_SSL_CAINFO,REQUESTS_CA_BUNDLE. Adopted. - Shared internal network (all sandboxes on one network). We use per-sandbox networks for stronger isolation.
| Behavior | Result |
|---|---|
Aardvark-dns resolves external names on --internal |
No — NXDOMAIN |
dns_enabled: true internal network |
Container names only, not external |
Static IPs via --network name:ip=x.x.x.x |
Works |
| Multi-network via comma syntax with static IPs | Broken — only first network gets IP |
Multi-network via podman network connect |
Works |
| Proxy on both networks | Default route via bridge, internet confirmed |
| Agent on internal-only | Cannot reach internet (Network unreachable) |
curl with HTTP_PROXY, no local DNS |
Works — sends CONNECT with hostname |
wget with HTTP_PROXY, no local DNS |
Fails — resolves locally first |
resolv.conf updated by podman network connect |
No — set at creation time only |
Prep commit. Remove /proc/net/tcp identity scanning, BinaryIdentityCache, and all OPA policy fields that depend on binary_path/binary_sha256. The OPA TCP input simplifies to:
{ "host": "api.anthropic.com", "port": 443 }L7 input keeps the request fields but drops exec:
{
"network": { "host": "...", "port": 443 },
"request": { "method": "GET", "path": "/v1/chat", "query_params": {} }
}This is a standalone change — it can land independently and unblocks everything else. Key code to remove: BinaryIdentityCache, find_socket_inode_owners, parse_proc_net_tcp, file_sha256, the entrypoint_pid: Arc<AtomicU32> parameter on ProxyHandle::start_with_bind_addr, and the CAP_SYS_PTRACE requirement.
Add to the hand-rolled Podman client in crates/openshell-driver-podman/src/client.rs:
// Create an --internal bridge network. Idempotent.
pub async fn ensure_internal_network(&self, name: &str) -> Result<(), PodmanApiError>;
// Inspect container to get its IP on a specific network.
pub async fn container_ip(&self, container: &str, network: &str) -> Result<Option<String>, PodmanApiError>;
// Connect a running container to an additional network.
pub async fn network_connect(&self, network: &str, container: &str) -> Result<(), PodmanApiError>;
// Get the /24 subnet base (e.g., "10.89.5") for fixed IP derivation.
pub async fn network_subnet_base(&self, name: &str) -> Result<String, PodmanApiError>;
// Remove a network. Idempotent.
pub async fn remove_network(&self, name: &str) -> Result<(), PodmanApiError>;Also extend ContainerSpec with static_ips (per-network) and dns_server (for --dns).
New mode for openshell-sandbox — the sidecar entry point. Runs in crates/openshell-sandbox/src/proxy_mode.rs.
pub async fn run_proxy_mode(args: SandboxArgs) -> miette::Result<()> {
// 1. Connect to gateway, fetch policy, build OPA engine
// 2. Fetch provider env, build SecretResolver
// 3. Load existing CA from /openshell-tls/ or generate new one
// (persist so restarts don't invalidate cached CA in agent)
// 4. Build inference context
// 5. Start L7 proxy on :3128
// 6. Start TCP port-forward on :8081 → gateway gRPC
// 7. Spawn policy poll loop + inference route poller
// 8. Write /openshell-tls/.ready (only after BOTH listeners are bound)
// 9. Wait for shutdown
}The TCP forwarder is a simple tokio relay. One subtlety: OPENSHELL_ENDPOINT is an HTTP URL, but TcpStream::connect needs host:port:
async fn run_tcp_forward(listen_addr: SocketAddr, upstream_url: &str) -> miette::Result<()> {
let url: url::Url = upstream_url.parse()?;
let upstream = format!("{}:{}", url.host_str().unwrap(), url.port_or_known_default().unwrap());
let listener = TcpListener::bind(listen_addr).await?;
loop {
let (client, _) = listener.accept().await?;
let upstream = upstream.clone();
tokio::spawn(async move {
if let Ok(server) = TcpStream::connect(&upstream).await {
let (mut cr, mut cw) = client.into_split();
let (mut sr, mut sw) = server.into_split();
tokio::select! {
_ = tokio::io::copy(&mut cr, &mut sw) => {},
_ = tokio::io::copy(&mut sr, &mut cw) => {},
}
}
});
}
}When OPENSHELL_PROXY_MODE=sidecar is set in lib.rs, the supervisor:
- Skips inner network namespace, iptables, and proxy startup (the topology and sidecar handle these)
- Still applies Landlock and seccomp if the sandbox policy requests them
- Still starts SSH server and runs the workload
This is distinct from OPENSHELL_MODE=nested which disables ALL enforcement. The sidecar mode only disables network-related enforcement.
One gotcha: apply_child_env in ssh.rs calls env_clear() before setting up child processes. The proxy env vars (HTTP_PROXY, SSL_CERT_FILE, GIT_SSH_COMMAND, etc.) set via the container spec will be dropped from SSH/exec sessions. The fix is to propagate a specific set of env vars from the supervisor's own environment into children.
The sandbox creation sequence in crates/openshell-driver-podman/src/driver.rs:
async fn create_sandbox(&self, sandbox: &DriverSandbox) -> Result<...> {
let internal_net = format!("openshell-sbx-{}", sandbox.id);
// 1. Per-sandbox internal network
self.client.ensure_internal_network(&internal_net).await?;
// 2. Fixed IPs from subnet
let base = self.client.network_subnet_base(&internal_net).await?;
let proxy_ip = format!("{base}.2");
let agent_ip = format!("{base}.3");
// 3. Shared TLS volume (proxy writes CA + .ready; agent reads)
let tls_vol = format!("openshell-tls-{}", sandbox.id);
self.client.create_volume(&tls_vol).await?;
// 4. Proxy sidecar: internal network + fixed IP + host DNS resolvers
// (resolv.conf is set at creation time; podman network connect won't update it)
let proxy_name = format!("openshell-proxy-{}", sandbox.name);
self.client.create_container(&build_proxy_sidecar_spec(...)).await?;
// 5. Connect proxy to bridge (gives it internet access via default route)
self.client.network_connect(&self.config.network_name, &proxy_name).await?;
self.client.start_container(&proxy_name).await?;
// 6. Wait for .ready sentinel
self.wait_for_proxy_ready(&proxy_name, Duration::from_secs(30)).await?;
// 7. Agent: internal-only, fixed IP, OPENSHELL_PROXY_MODE=sidecar
self.client.create_container(&build_agent_container_spec(...)).await?;
self.client.start_container(&agent_name).await?;
Ok(...)
}Deletion is the reverse: agent → proxy → volume → network.
The proxy sidecar container is essentially:
- Image: supervisor image (already has the
openshell-sandboxbinary) - Entrypoint:
/openshell-sandboxwithOPENSHELL_MODE=proxy - Networks: internal only at creation (bridge added via
network_connect) - DNS: explicit
--dnsto host resolvers (discovered from/run/systemd/resolve/resolv.conf, filtering127.xand169.254.x) - Volume: TLS volume mounted rw
The agent container:
- Image: user-specified sandbox image
- Entrypoint: sideloaded
openshell-sandboxvia image volumes - Networks: internal only (no bridge, no internet route)
OPENSHELL_PROXY_MODE=sidecar+ all the proxy/CA env varsGIT_SSH_COMMANDdisabled (prevents git-over-SSH exfiltration)- Volume: TLS volume mounted ro
- Capabilities:
ALLdropped;SYS_ADMINadded only if policy requests Landlock
Agent container env:
OPENSHELL_PROXY_MODE=sidecar
HTTP_PROXY=http://{proxy_ip}:3128
HTTPS_PROXY=http://{proxy_ip}:3128
OPENSHELL_ENDPOINT=http://{proxy_ip}:8081
SSL_CERT_FILE=/openshell-tls/ca.pem
NODE_EXTRA_CA_CERTS=/openshell-tls/ca.pem
CURL_CA_BUNDLE=/openshell-tls/ca.pem
GIT_SSL_CAINFO=/openshell-tls/ca.pem
REQUESTS_CA_BUNDLE=/openshell-tls/ca.pem
GIT_SSH_COMMAND=echo 'SSH disabled — use HTTPS' && exit 1
Other driver changes:
- Add
openshell.role=proxy/openshell.role=agentlabels. Filterlist_sandboxesonrole=agentto avoid returning duplicates. - Monitor container events for cleanup of orphaned resources.
- Remove
build_container_spec_supervisedandbuild_container_spec_passthrough.
Update gateway-podman.sh. Test the full flow:
mise run gateway:podman
openshell sandbox create --name test
# Verify 3 resources
podman network ls | grep openshell-sbx
podman ps | grep openshell-proxy-test
podman ps | grep openshell-agent-test
# Proxy works
openshell sandbox exec test -- curl -s https://api.github.com/zen
# Bypass fails
openshell sandbox exec test -- bash -c 'unset HTTP_PROXY HTTPS_PROXY; curl --connect-timeout 5 https://api.github.com/zen'
# → Network unreachable
# SSH works
openshell sandbox connect test
# Git-over-SSH blocked
openshell sandbox exec test -- git clone git@github.com:test/repo.git
# → SSH disabled — use HTTPS
# Cleanup
openshell sandbox delete test
# No orphaned networks, containers, or volumes-
wgetand non-proxy-aware tools.wgetresolves DNS locally before connecting to the proxy, so it fails on the internal network. Most development tools (curl, python, node, git) work fine. Known limitation for now; DNS forwarder in the sidecar is future work if needed. -
Proxy code sharing. Both proxy-only mode and inner proxy mode use the same code in
openshell-sandbox. A future refactor could extract shared logic intoopenshell-proxy-core. -
Split OCSF logs. Network/L7 events now come from the proxy sidecar; SSH/process events from the agent supervisor. Two separate log streams. Document as a known change; merging is future work.
-
Proxy readiness: poll vs. healthcheck. Starting with
podman execpolling for.ready. Podman healthcheck is an alternative if polling proves unreliable.
I generated some commits on https://github.com/cgwalters/OpenShell/commits/main/ for this