Kata Containers: Shim-Agent Communication Threat Vector Analysis

Date: 2026-03-17
Branch: main (commit 660e3bb65)
Scope: Shim (host-side, Go) to kata-agent (guest-side, Rust) communication
Disclaimer: This report is generated using Claude Code and full human review is TBD. Also note than in a real deployment it's always recommended to use defense-in-depth, for example LSM, network policies etc

Architecture Overview
1. Transport Layer Threats
2. API Surface Threats (Agent RPC Methods)
3. Sandbox Escape & Privilege Escalation Vectors
4. Authorization & Policy Gaps
- 4.1 Agent Policy is Optional (Off by Default)
- 4.2 No Per-Container Authorization
5. Threat Summary Matrix
6. Recommendations
7. Agent-Policy Deep Dive: Coverage of Host-to-Guest Attack Vectors
8. Guest-to-Host Threat Analysis: Compromised Container Inside Kata VM
Key Source Files Reference

Architecture Overview

The kata-shim (host-side, Go) communicates with the kata-agent (guest-side, Rust) over TTRPC (a simplified gRPC without HTTP/2) using Protocol Buffers v3. The transport is either vsock (QEMU/CLH), hybrid-vsock (Firecracker), or Unix domain sockets (remote hypervisor).

1. Transport Layer Threats

1.1 No Encryption (TTRPC is Plaintext)

Finding: All shim-agent communication is unencrypted. There is no TLS, mTLS, or any application-layer encryption.
File: src/runtime/virtcontainers/pkg/agent/protocols/client/client.go:91
Mitigation in place: Relies on implicit transport isolation -- vsock is hypervisor-mediated, Unix sockets are permission-restricted.
Risk: If the hypervisor is compromised or a co-tenant VM can sniff vsock traffic (e.g., via hypervisor bug), all RPC payloads -- including OCI specs, environment variables, file contents (CopyFile), and stdin/stdout streams -- are exposed in cleartext.

1.2 No Authentication or Mutual Identity Verification

Finding: The shim connects to the agent with zero credentials. No tokens, certificates, or shared secrets are exchanged.
File: src/runtime/virtcontainers/pkg/agent/protocols/client/client.go:72-98
Risk: Any process that can reach the vsock port (CID:1024) or the Unix socket path can issue arbitrary agent RPCs. This is a confused deputy risk if another process on the host gains access to the socket.

1.3 Hybrid-VSock Handshake Weakness

Finding: Firecracker's hybrid-vsock uses a simple text-based handshake: shim sends "CONNECT <port>\n", agent responds with "OK".
File: src/runtime/virtcontainers/pkg/agent/protocols/client/client.go:387-449
Risk: No integrity check on the handshake. A MITM at the Unix socket layer could intercept and replay or inject the handshake.

1.4 Fixed, Predictable Port

Finding: Agent always listens on vsock port 1024 (vSockPort constant).
File: src/runtime/virtcontainers/hypervisor.go:80
Risk: Reduces attack complexity -- an attacker who compromises the hypervisor layer knows exactly which port to target.

2. API Surface Threats (Agent RPC Methods)

The agent exposes ~35 RPC methods defined in src/libs/protocols/protos/agent.proto. Each is a potential attack vector if an attacker can send crafted requests.

2.1 Arbitrary Process Execution -- `ExecProcess`

File: src/agent/src/rpc.rs:424
Risk: Allows spawning arbitrary processes with controlled args, env vars, capabilities, and UID/GID inside the guest. A compromised shim or rogue host process can execute anything inside the VM.

2.2 Kernel Module Loading -- `CreateSandbox`

File: src/agent/src/rpc.rs:1341
Risk: CreateSandbox can trigger modprobe with attacker-controlled module names and parameters. Module names and parameters are passed directly to Command::new(MODPROBE_PATH) without sanitization. This is a command injection vector if module parameters contain shell metacharacters.

2.3 Iptables Rule Injection -- `SetIPTables`

File: src/agent/src/rpc.rs:1178-1208
Risk: Executes iptables-restore with attacker-controlled stdin data. Malicious rules could open the guest firewall, redirect traffic, or enable exfiltration channels.

2.4 File Write -- `CopyFile`

File: src/agent/src/rpc.rs:2038
Mitigation in place: Path must start with /run/kata-containers (line 2041).
Remaining risk: Supports symlink creation (line 2097-2121) and custom file modes/ownership. A symlink within /run/kata-containers could point elsewhere in the guest filesystem, creating an escape primitive.

2.5 Network Manipulation -- `UpdateInterface`, `UpdateRoutes`, `AddARPNeighbors`

File: src/agent/src/rpc.rs:1046, 1094, 1406
Risk: Full guest networking control: IP address injection, default route hijacking, ARP cache poisoning. An attacker can redirect all guest traffic or perform MitM within the guest network namespace.

2.6 System Clock & Entropy -- `SetGuestDateTime`, `ReseedRandomDev`

File: src/agent/src/rpc.rs:1503, 1448
Risk: Time manipulation can break TLS certificate validation, log integrity, and replay protections. Entropy injection can weaken guest RNG state.

2.7 Memory/CPU Hotplug -- `OnlineCPUMem`, `MemHotplugByProbe`, `AddSwap`

File: src/agent/src/rpc.rs:1434, 1490, 1601
Risk: Memory probe writes to /sys/devices/system/memory/probe with attacker-controlled addresses. Swap manipulation can cause DoS or expose sensitive memory pages.

3. Sandbox Escape & Privilege Escalation Vectors

3.1 CRITICAL -- Container ID Path Traversal

File: src/agent/src/rpc.rs:2202-2251 (setup_bundle())
Issue: Bundle path is constructed as Path::new(CONTAINER_BASE).join(cid). If verify_id() doesn't reject ../ sequences, the container rootfs could be bind-mounted to arbitrary guest paths.
Severity: CRITICAL -- potential guest filesystem escape.

3.2 CRITICAL -- Mount Symlink Following

File: src/agent/src/storage/mod.rs:281, src/agent/src/mount.rs:67-122
Issue: nix::mount::mount() is called without resolving symlinks in destination paths. The kernel follows symlinks during mount, so a symlink planted in a shared directory could redirect a bind mount outside the intended container boundary.
Severity: CRITICAL -- container-to-guest escape.

3.3 HIGH -- Namespace Path Injection

File: src/agent/src/rpc.rs:1859-1911
Issue: IPC, UTS, and PID namespace paths from the host are used directly (PathBuf::from(&sandbox.shared_ipcns.path)) without validating they point to legitimate namespace files. A compromised host could inject paths to host namespaces.
Severity: HIGH -- namespace confusion attack.

3.4 HIGH -- OCI Spec Constraint Stripping

File: src/runtime/virtcontainers/kata_agent.go:1056-1066
Issue: Device cgroups, PID limits, BlockIO, network limits, and CPU constraints are all set to nil before sending to the agent. This means resource isolation is not enforced inside the guest.
Severity: HIGH -- DoS via resource exhaustion within the guest.

3.5 HIGH -- VFIO Sysfs Path Traversal

File: src/runtime/virtcontainers/container.go:1362-1363
Issue: vfioGroup (derived from device path) is used directly in filepath.Join(config.SysIOMMUGroupPath, vfioGroup, "devices"). If it contains ../, it could read arbitrary sysfs paths on the host.
Severity: HIGH -- host information disclosure.

3.6 MEDIUM -- Capability Passthrough

File: src/runtime/virtcontainers/kata_agent.go:1014-1103
Issue: constrainGRPCSpec() does NOT filter Linux capabilities. If a privileged container spec is passed, full capabilities (including CAP_SYS_ADMIN, CAP_NET_RAW, etc.) are forwarded to the agent and granted inside the guest.
Severity: MEDIUM -- depends on guest kernel attack surface.

4. Authorization & Policy Gaps

4.1 Agent Policy is Optional (Off by Default)

File: src/agent/src/policy.rs:12-45, src/agent/src/rpc.rs:155-156
Issue: The agent-policy feature gate controls whether RPC authorization is enforced. Without it, every RPC method is implicitly allowed. Most deployments do not enable this.
Risk: Any entity with socket access has full, unrestricted control over the guest VM.

4.2 No Per-Container Authorization

Even with policy enabled, there's no per-container identity or authorization. Any authenticated caller can operate on any container within the sandbox.

5. Threat Summary Matrix

Vector	Severity	Pre-Condition	Impact
Plaintext TTRPC (eavesdropping)	HIGH	Hypervisor compromise or vsock bug	Full data exfiltration
No authentication on agent socket	HIGH	Host process reaches vsock/UDS	Full VM control
Container ID path traversal	CRITICAL	Crafted container ID bypasses `verify_id`	Guest filesystem escape
Mount symlink following	CRITICAL	Symlink in shared dir before mount	Container-to-guest escape
Kernel module injection via `CreateSandbox`	CRITICAL	Compromised shim	Arbitrary kernel code in guest
Iptables stdin injection	HIGH	Compromised shim	Guest firewall bypass
Namespace path injection	HIGH	Compromised shim	Namespace confusion
VFIO sysfs path traversal	HIGH	Malformed device path	Host info disclosure
Resource constraint stripping	HIGH	By design	Guest-internal DoS
Missing agent-policy enforcement	MEDIUM	Default configuration	Unrestricted guest API
CopyFile symlink creation	MEDIUM	Valid shim access	Guest file overwrite
DNS/network manipulation	MEDIUM	Valid shim access	Guest traffic hijack
Clock/entropy manipulation	LOW	Valid shim access	Crypto weakening, log tampering

6. Recommendations

Enable agent-policy in production -- compile with agent-policy feature and deploy a restrictive allowlist of permitted RPCs.
Add path canonicalization before all mount and bundle operations (realpath / canonicalize before mount()).
Validate container IDs -- reject any ID containing /, .., or null bytes before path construction.
Sanitize modprobe parameters -- reject module names/params with shell metacharacters.
Consider TTRPC-over-TLS for deployments where vsock isolation guarantees are insufficient (e.g., nested virtualization, shared hypervisor environments).
Audit CopyFile symlink handling -- disallow symlink creation or validate symlink targets stay within /run/kata-containers.
Enforce capability dropping in constrainGRPCSpec() -- strip dangerous capabilities before forwarding to the agent.

7. Agent-Policy Deep Dive: Coverage of Host-to-Guest Attack Vectors

7.1 How Agent-Policy Works

The agent-policy system is an OPA/Rego-based authorization gate built into the kata-agent. It uses regorus (a Rust OPA engine) to evaluate every incoming RPC request against a Rego policy document before execution.

Key architecture:

Engine: regorus::Engine in src/agent/policy/src/policy.rs:33
Gate function: is_allowed() in src/agent/src/policy.rs:31 -- serializes each request to JSON, then evaluates data.agent_policy.<RequestName> in the Rego engine
Enforcement point: Every AgentService trait method in src/agent/src/rpc.rs calls is_allowed(&req).await? before processing
Policy delivery: Policy is loaded from a default file (/etc/kata-opa/default-policy.rego), from initdata, or dynamically via the SetPolicy RPC
Bundled policies: src/kata-opa/allow-all.rego (permits everything), src/kata-opa/allow-all-except-exec-process.rego, src/kata-opa/allow-set-policy.rego
Production policy: src/tools/genpolicy/rules.rego -- comprehensive, per-field validation rules

7.2 The Critical Caveat: Compile-Time Optional

// src/agent/src/rpc.rs:155-158
#[cfg(not(feature = "agent-policy"))]
async fn is_allowed(_req: &impl serde::Serialize) -> ttrpc::Result<()> {
    Ok(())  // ALWAYS ALLOWS EVERYTHING
}

Without the agent-policy feature flag at compile time, is_allowed() is a no-op. This means none of the protections described below exist in a default build.

7.3 Coverage Analysis by Threat Vector

7.3.1 Well-Covered (with genpolicy rules.rego)

Threat Vector	Policy Default	Depth of Inspection
ExecProcess (arbitrary exec)	`false` (blocked)	Deep -- validates against allowlisted commands, regex patterns, container state, capabilities (`allow_exec_caps` rejects all capability sets), UID/GID, and SELinux/AppArmor labels (`rpc.rs:841`, `rules.rego:1572-1617`)
CreateContainer (OCI spec injection)	`false` (blocked)	Very deep -- validates OCI version, root readonly, annotations, namespace, sandbox name, container type, process args/env/cwd, capabilities, mounts, storages, devices, Linux namespace config (`rules.rego:60-121`)
CreateSandbox (kernel module loading)	`false` (blocked)	Strong -- explicitly requires `kernel_modules` count == 0 and `guest_hook_path` empty (`rules.rego:1532-1543`). This completely blocks the kernel module injection vector
CopyFile (arbitrary file write)	`false` (blocked)	Good -- validates path against regex allowlist AND checks for directory traversal (`../`) via `check_directory_traversal()` (`rules.rego:1491-1530`)
ReadStream (stdout/stderr exfiltration)	`false` (blocked)	Unique behavior -- the RPC still executes but redacts the response data if policy denies it (`rpc.rs:971-974, 984-987`). This prevents log/output exfiltration while keeping container plumbing functional
WriteStream (stdin injection)	`false` (blocked)	Binary allow/deny only
Devices (VFIO passthrough)	Validated per-container	Separates VFIO devices from volume devices and validates each against policy-declared device lists with CDI annotation regex matching (`rules.rego:472-498`)

7.3.2 Partially Covered

Threat Vector	Policy Default	Gap
UpdateRoutes (route hijacking)	`false` (blocked)	Has route validation with `forbidden_source_regex` and `forbidden_device_names` (`rules.rego:1629-1636`), but the validation is configurable -- a weak policy could still allow malicious routes
UpdateInterface (IP injection)	`false` (blocked)	Binary allow/deny only. No inspection of IP address values, MTU, or MAC address -- if allowed, any interface config is accepted
SetIPTables (firewall injection)	Not in genpolicy defaults	Binary allow/deny -- if allowed, arbitrary iptables rules can be injected. No content inspection of the iptables data payload
AddARPNeighbors (ARP spoofing)	`false` (blocked)	Binary allow/deny only. If allowed, arbitrary ARP entries accepted
SignalProcess	`true` (ALLOWED)	Always allowed by default in genpolicy. No validation of signal number. A compromised host can send any signal (including SIGKILL) to any process
Capabilities in CreateContainer	Validated	`allow_caps()` compares all 5 capability sets against policy -- but policy author must define them correctly. Regex matching means overly broad patterns could grant excessive capabilities (`rules.rego:1406-1430`)

7.3.3 Not Covered (Gaps)

Threat Vector	Issue
SetPolicy itself	Default `false` in genpolicy, but the bundled `allow-set-policy.rego` sets it to `true`. If `SetPolicy` is allowed, an attacker can replace the entire policy with `allow-all.rego`, completely defeating all protections. The `SetPolicy` RPC checks policy before applying (`policy.rs:37-44`), so it's self-protecting -- but only if the initial policy blocks it
AllowRequestsFailingPolicy	If set to `true` in the Rego policy, ALL policy failures are silently ignored (`policy.rs:187-190`). This is a debug flag that completely disables security. Genpolicy defaults it to `false`, but nothing prevents a policy author from enabling it
ReseedRandomDev (entropy injection)	`false` (blocked) but binary allow/deny only -- if allowed, arbitrary entropy data accepted without validation
SetGuestDateTime (clock manipulation)	`false` (blocked) but binary allow/deny only -- if allowed, any timestamp accepted
MemHotplugByProbe (memory probe injection)	`false` (blocked) but binary allow/deny only -- if allowed, arbitrary probe addresses accepted
GetIPTables (firewall enumeration)	Not in genpolicy defaults. If allowed, leaks full guest firewall rules -- information disclosure
AddSwap / AddSwapPath	`false` (blocked) but binary allow/deny only
OnlineCPUMem	`true` (ALLOWED) by default. No validation of parameters
Env var content in ExecProcess	`CreateContainer` validates env var names/values against policy patterns, but `ExecProcess` does not deeply validate environment variables
Mount source/destination paths	Policy validates storages in `CreateContainer` but mount symlink following and path canonicalization are not addressable by policy -- these are runtime bugs
Namespace path injection	Policy validates namespace type (PID, IPC, UTS) but not namespace paths. Arbitrary namespace paths from the host are accepted

7.4 Critical Architectural Weaknesses in the Policy System

7.4.1 SetPolicy is a Self-Destruct Button

SetPolicy is itself guarded by policy (policy.rs:37-44), creating a bootstrap problem. If the initial policy allows SetPolicy, the entire policy can be replaced at runtime by the host. The bundled allow-set-policy.rego (src/kata-opa/allow-set-policy.rego) does exactly this -- sets only SetPolicyRequest := true, meaning only SetPolicy works and everything else is denied by default. But if a permissive initial policy is loaded that allows SetPolicy, it becomes a complete bypass vector.

7.4.2 AllowRequestsFailingPolicy Silently Disables Everything

# rules.rego:48-52
# AllowRequestsFailingPolicy := true configures the Agent to *allow any
# requests causing a policy failure*.
default AllowRequestsFailingPolicy := false

When true, every denied request is logged as a warning but still executed (policy.rs:187-190). This is documented as a debug feature but is a global policy bypass with zero audit trail beyond warn-level logs.

7.4.3 Request-Level Only, No Session-Level Controls

There's no concept of caller identity, session, or connection-level authorization. Every request is evaluated independently. This means:

No rate limiting on requests
No detection of anomalous request patterns (e.g., rapid CreateContainer/DestroyContainer cycling)
A single allowed RPC can be called unlimited times

7.4.4 Binary Allow/Deny on Most Dangerous Operations

For the most dangerous RPCs (SetIPTables, UpdateInterface, AddARPNeighbors, ReseedRandomDev, MemHotplugByProbe), the policy is a simple boolean gate. If allowed, the full request payload is accepted without any content inspection. Only CreateContainer, ExecProcess, CopyFile, CreateSandbox, and UpdateRoutes have deep content validation in the genpolicy rules.

7.4.5 Serialization-Dependent Inspection

Policy evaluation works by serializing the protobuf request to JSON (policy.rs:32), then passing it to Rego. This means the policy can only inspect fields that survive JSON serialization. Binary fields (like SetIPTablesRequest.data containing raw iptables rules) are base64-encoded in JSON, making content inspection impractical in Rego.

7.5 Policy Effectiveness Summary

	With genpolicy rules.rego	With allow-all.rego	Without agent-policy feature
Arbitrary exec	Blocked (allowlisted commands only)	OPEN	OPEN
Malicious container specs	Blocked (deep OCI validation)	OPEN	OPEN
Kernel module loading	Blocked (count==0 enforced)	OPEN	OPEN
Arbitrary file writes	Blocked (path regex + traversal check)	OPEN	OPEN
Log/output exfiltration	Redacted	OPEN	OPEN
Iptables injection	Blocked (default deny)	OPEN	OPEN
Network config changes	Blocked (default deny)	OPEN	OPEN
Policy replacement	Blocked (default deny)	Self-destructible	N/A
Signal to any process	OPEN	OPEN	OPEN
CPU/memory hotplug	OPEN	OPEN	OPEN
Symlink/mount path attacks	NOT ADDRESSABLE	NOT ADDRESSABLE	NOT ADDRESSABLE
Namespace path injection	NOT ADDRESSABLE	NOT ADDRESSABLE	NOT ADDRESSABLE

7.6 Policy-Specific Recommendations

Always compile with agent-policy -- the no-op fallback makes all other protections meaningless.
Use genpolicy rules.rego as baseline -- never deploy allow-all.rego in production.
Block SetPolicy in the initial policy -- once loaded, the policy should be immutable. Use initdata or the default policy file for delivery.
Never enable AllowRequestsFailingPolicy -- even in staging, as it silently bypasses all authorization.
Add content inspection for SetIPTables -- decode and validate iptables rules in the Rego policy, or block the RPC entirely.
Restrict SignalProcess -- add signal number validation to the genpolicy rules (e.g., only allow SIGTERM/SIGKILL).
Restrict OnlineCPUMem -- add parameter bounds checking or switch default to false.
Address runtime-level bugs separately -- symlink following, namespace path injection, and mount path traversal cannot be solved by policy alone. These require code fixes in the agent's Rust implementation.

8. Guest-to-Host Threat Analysis: Compromised Container Inside Kata VM

8.1 Attack Model

A container running inside the Kata VM gains root privileges (e.g., via kernel exploit, misconfigured capabilities, or privileged: true). The attacker now has:

Full root access to the guest Linux kernel
Control over the kata-agent process (PID 1 in guest)
Access to all guest network interfaces
Ability to read/write all guest filesystems including shared mounts

8.2 TTRPC Response Poisoning (Guest -> Shim)

The agent is a TTRPC server only -- it cannot initiate reverse connections to the host. However, the shim continuously polls the agent for data, and trusts all response data without validation.

8.2.1 OOM Event Container ID Injection (HIGH)

src/runtime/pkg/containerd-shim-v2/wait.go:165 -- shim calls GetOOMEvent() in a loop
Agent returns OOMEvent.container_id from its internal cgroup monitor (src/agent/src/sandbox.rs:438-463)
A root-compromised guest can manipulate cgroups to trigger OOM events with arbitrary container IDs
Shim uses this container ID to:
- Look up s.containers[containerID] map (wait.go:177)
- Create a file at path.Join(c.bundle, "oom") on the host filesystem (wait.go:178-186)
- Publish a TaskOOM event to containerd with the attacker-controlled ID (wait.go:190-192)
The containerd event propagates to the Kubernetes event stream
Impact: Host filesystem file creation (limited to existing container bundle paths), event spoofing to Kubernetes control plane

8.2.2 Metrics Response Injection (MEDIUM)

src/runtime/pkg/containerd-shim-v2/shim_management.go:107-146 -- decodeAgentMetrics() parses agent's response as Prometheus text format
Raw agent string fed to expfmt.NewDecoder() without validation (shim_management.go:122-123)
Metric names are prefixed but not sanitized (shim_management.go:137-139)
Impact: Malformed metrics could cause DoS on Prometheus scrape endpoints; metric name injection could confuse monitoring/alerting

8.2.3 IPTables Response Passthrough (MEDIUM)

src/runtime/pkg/containerd-shim-v2/shim_management.go:258-264 -- GetIPTables() returns raw iptables-save output from guest
Written directly to HTTP response body (w.Write(buf)) without validation
A compromised guest controls what iptables-save returns (or replaces the binary entirely)
Impact: Misrepresentation of guest firewall state to management tools; if parsing tools consume this output, potential injection

8.2.4 Stats/CgroupStats JSON Deserialization (LOW)

src/runtime/virtcontainers/kata_agent.go:2146-2154 -- agent's StatsContainerResponse.CgroupStats is JSON-marshaled then unmarshaled into host-side struct
Protobuf provides structural safety, but deeply nested or huge cgroup stats could cause resource exhaustion
Impact: DoS against shim process via memory exhaustion

8.2.5 Process Exit Code Manipulation (MEDIUM)

WaitProcess RPC returns exit status from agent
A compromised agent can return arbitrary exit codes
Exit codes influence Kubernetes restart policy decisions (CrashLoopBackOff, etc.)
Impact: Can prevent pod restart (return 0) or force restart loops

8.2.6 Stream Read Length Mismatch (HIGH)

src/runtime/virtcontainers/kata_agent.go:2500-2514 -- readProcessStream():

copy(data, resp.Data)           // copies min(len(data), len(resp.Data)) bytes
return len(resp.Data), nil      // returns len(resp.Data), NOT bytes actually copied

The shim requests uint32(len(data)) bytes via ReadStreamRequest.Len, but the agent can return more bytes than requested
Go's copy() is memory-safe (copies min(len(dst), len(src)) bytes), so there is no buffer overflow
However, the function returns len(resp.Data) -- the attacker-controlled length -- not the actual number of bytes copied into the destination buffer
This io.Reader implementation (iostream.go:80-96) feeds into containerd's I/O pump
Impact: The caller's bookkeeping of bytes read will be wrong: it believes N bytes were read when only min(N, bufsize) were actually copied. This can cause log truncation, stream offset misalignment, or data duplication depending on how the consumer advances its position. A compromised agent can exploit this to corrupt container log output visible to kubectl logs.

8.2.7 No TTRPC Response Size Limit (HIGH)

src/runtime/virtcontainers/pkg/agent/protocols/client/client.go:91 -- ttrpc.NewClient(conn, ...) is created with no MaxRecvMsgSize option
The TTRPC library's default max message size (typically 4MB in containerd/ttrpc) applies, but no kata-specific limit is configured
The shim's grpcMaxDataSize of 1MB (kata_agent.go:133) only constrains outbound CopyFile request chunks, not inbound responses
Responses are fully deserialized into memory before the shim processes them
A compromised agent can send near-maximum-size responses to every single RPC call
RPCs called in loops (GetOOMEvent, ReadStdout, ReadStderr, GetMetrics) amplify this
Impact: Sustained large responses cause shim process memory exhaustion on the host. Since the shim is a per-pod process, this can destabilize the node if many pods are targeted simultaneously.

8.2.8 Error Message Injection (MEDIUM)

Throughout src/runtime/pkg/containerd-shim-v2/shim_management.go, error messages from failed agent RPCs are written directly to HTTP response bodies:

// shim_management.go:252-253
w.WriteHeader(http.StatusInternalServerError)
w.Write([]byte(err.Error()))    // agent error string -> HTTP body

This pattern appears for iptables (line 252), volume stats (line 170), volume resize (line 195), and policy (line 219)
A compromised agent can craft TTRPC error messages containing:
- HTTP response splitting characters (if consumed by a naive HTTP client)
- Log injection payloads (newlines, ANSI escape sequences)
- Misleading error text to mask real failures
Impact: Log poisoning on the host; potential HTTP response injection if management API consumers don't sanitize error bodies

8.2.9 Volume Stats Response Passthrough (MEDIUM)

src/runtime/pkg/containerd-shim-v2/shim_management.go:148-173 -- serveVolumeStats():

buf, err := s.sandbox.GuestVolumeStats(context.Background(), volumePath)
w.Write(buf)   // raw agent JSON written to HTTP response

kata_agent.go:2714 JSON-marshals the agent's VolumeStatsResponse and passes it directly to the HTTP response consumed by kubelet
A compromised agent can return:
- Fabricated capacity/inode numbers influencing kubelet eviction decisions
- Extremely large JSON payloads causing kubelet memory pressure
Impact: Kubelet volume management decisions based on false data; potential eviction of pods on node due to fake "disk full" reports

8.2.10 GuestDetails Version/Feature Spoofing (MEDIUM)

src/runtime/virtcontainers/kata_agent.go:2517-2527 -- getGuestDetails() returns GuestDetailsResponse containing:
- AgentDetails.Version -- used for version-based feature decisions
- AgentDetails.SupportedFeatures -- determines which RPCs the shim will attempt
- MemBlockSizeBytes -- used for memory hotplug calculations
- AgentDetails.DeviceHandlers / StorageHandlers -- determines device handling paths
A compromised agent can report false capabilities, causing the shim to:
- Use wrong code paths based on fake version info
- Skip security-relevant operations it believes are unsupported
- Miscalculate memory block sizes for hotplug, potentially causing guest instability or overcommit
Impact: Shim logic manipulation; incorrect resource accounting; potential bypass of version-gated security features

8.2.11 Interface/Route Response Divergence (LOW)

src/runtime/virtcontainers/kata_agent.go:641-644 and 673-676:

if resultInterface, ok := resultingInterface.(*pbTypes.Interface); ok {
    return resultInterface, err
}

updateInterface() and updateRoutes() return the agent's response to the caller. The shim uses this "resulting" state to update its internal network model
A compromised agent returning different interface/route data than what was requested causes the shim's internal network state to diverge from actual guest network state
Impact: Incorrect network teardown, resource leaks, or stale state preventing proper pod cleanup

8.2.12 Architectural Root Cause

The fundamental issue across all response vectors is that the shim treats the agent as a trusted component. Every RPC response is consumed without:

Response size validation or per-RPC size limits
Content sanitization or escaping before writing to HTTP/logs
Semantic validation (does the response match what was requested?)
Rate limiting on event-style RPCs (OOM, metrics, stream reads)
Bounds checking on returned lengths against request parameters

This trust model was designed for the "protect host from untrusted container" threat model -- the agent was part of the trusted computing base. However, in the compromised container scenario (root escape inside guest VM), the agent itself becomes adversarial, and neither direction has adequate validation. In the confidential computing use case (where the host is untrusted), this same trust inversion applies in the opposite direction.

8.3 Kubernetes API Server Access via Pod Network (CRITICAL)

This is the highest-impact attack vector. The guest VM has full pod network connectivity with no guest-side filtering.

8.3.1 Network Architecture -- No Guest-Side Filtering

TC filter setup (src/runtime/virtcontainers/network_linux.go:940-1010) creates transparent bidirectional redirect between host veth and guest TAP
addRedirectTCFilter() (network_linux.go:1040-1063) uses u32 match u8 0 0 (match all traffic) with TCA_EGRESS_REDIR action
No packet filtering, no egress rules, no ingress rules at the TC level
Network policies (Calico/Cilium) are enforced on the host veth -- they apply to traffic entering/leaving the pod, but a root guest has already bypassed the container's internal network namespace

Host Network Namespace                         Guest VM (Isolated Kernel)
+-------------------------------------------+  +-------------------------------------------+
|                                           |  |                                           |
|  veth<pod>  ----TC REDIRECT (no filter)----->  TAP device  --->  eth0 (guest)          |
|  (host side)  <--TC REDIRECT (no filter)-----  (VM side)   <---  (full network stack)  |
|       |                                   |  |                                           |
|  Host Network Policy (Calico/Cilium)      |  |  NO firewall rules by default             |
|  (iptables/eBPF on veth)                  |  |  Can reach: all cluster IPs               |
|       |                                   |  |  Can resolve: all cluster DNS              |
|  Physical NIC --> cluster network         |  |  Can use: ServiceAccount tokens            |
+-------------------------------------------+  +-------------------------------------------+

8.3.2 ServiceAccount Token Exposure

Kubernetes mounts SA token at /var/run/secrets/kubernetes.io/serviceaccount/token
Token is shared into guest via virtiofs/9p as a regular file (src/runtime/virtcontainers/mount.go:312, src/runtime/virtcontainers/fs_share_linux.go:37-52)
File watcher (fs_share_linux.go:817-927) even syncs token rotations from host to guest
A root guest can read the token in plaintext

8.3.3 Full Cluster API Access

Guest DNS resolves cluster services (DNS config passed via CreateSandboxRequest, kata_agent.go:854-862)
Guest can resolve kubernetes.default.svc.cluster.local to API server
With SA token + API server access: full Kubernetes API access at the pod's RBAC level
Can list secrets, create pods, exec into other pods (depending on RBAC)

8.3.4 Lateral Movement via Cluster Network

Guest can reach any ClusterIP service
Can perform service/port scanning across the cluster network
Can reach NodePort services on all nodes
Can reach external services unless egress NetworkPolicy is enforced on host
DNS tunneling available for covert data exfiltration

8.4 Shared Filesystem Escape (Guest -> Host Filesystem)

8.4.1 Virtiofs Shared Directory Access (HIGH)

virtiofsd runs as root on host, serving the entire SharedPath directory (src/runtime/virtcontainers/virtiofsd.go:191: --shared-dir= + v.sourcePath)
Host shared path: /run/kata-containers/shared/sandboxes/$SANDBOX_ID/shared/
Guest mounts this at /run/kata-containers/shared/containers/
A root guest can access all containers' shared files in the same sandbox -- no per-container isolation at virtiofs level

8.4.2 Cross-Container Data Access (HIGH)

Sandbox-wide bind mounts (sandboxMountsDir) visible to all containers
Root guest can read/modify: /run/kata-containers/shared/containers/$OTHER_CONTAINER_ID/rootfs/
Can steal data, modify binaries, or plant backdoors in other containers' rootfs

8.4.3 Symlink Attack on Host Mounts (MEDIUM)

Host uses bind mounts with slave propagation from mounts/ to shared/ (fs_share_linux.go:222-230)
If guest creates symlinks in shared directory pointing outside the mount boundary, and host-side tools follow those symlinks, host filesystem could be accessed
virtiofsd's --shared-dir scopes access, but symlink resolution within that scope is still dangerous

8.4.4 ConfigMap/Secret Update Race (MEDIUM)

File watcher uses fsnotify to detect ConfigMap/Secret updates (fs_share_linux.go:839-913)
Copies updated files from host to guest on REMOVE events
TOCTOU race: guest could exploit timing between symlink target change and file copy

8.5 Virtio Device Attack Surface (Guest -> Hypervisor -> Host)

8.5.1 Virtio Device Emulation (HIGH)

QEMU exposes multiple virtio devices to guest: virtio-net, virtio-blk, virtio-vsock, virtio-fs, virtio-console
Root guest can craft malicious virtio descriptors targeting QEMU's device emulation code
Historically, QEMU virtio device emulation has been a rich source of CVEs (guest-to-host escape)

8.5.2 VFIO/IOMMU Bypass (CRITICAL if IOMMU disabled)

IOMMU protection is optional (src/runtime/virtcontainers/qemu.go:449-450): if q.config.IOMMU { ... }
Without IOMMU/vIOMMU, VFIO-passthrough devices can perform arbitrary DMA to host memory
Even with IOMMU, misconfigured IOMMU groups or IOMMUFD bugs could allow bypass

8.5.3 Vhost-User Shared Memory (HIGH)

Vhost-user devices (blk, scsi, fs, net) use shared memory regions between guest and host
Root guest could exploit vhost-user protocol to corrupt shared memory pages
virtiofsd uses vhost-user-fs protocol -- a malicious guest could send crafted vhost-user messages

8.6 Containerd/Kubernetes Event Spoofing

8.6.1 TaskOOM Event Spoofing (MEDIUM)

As described in 8.2.1, fake OOM events with arbitrary container IDs are published to containerd as /tasks/oom events (src/runtime/pkg/containerd-shim-v2/service.go:325-326)
Kubernetes consumes these to trigger pod eviction/rescheduling
Impact: Denial of service by triggering false OOM-based pod evictions

8.6.2 Process Exit Code Manipulation (MEDIUM)

WaitProcess RPC returns exit status from compromised agent
Exit codes influence Kubernetes restart policy decisions (CrashLoopBackOff, etc.)
Impact: Can prevent pod restart (return exit code 0) or force infinite restart loops

8.7 Full Attack Chain: Compromised Container to Cluster

Container gains root in Kata VM
|
+-- 1. Read ServiceAccount token from shared filesystem
+-- 2. Resolve kubernetes.default via cluster DNS
+-- 3. Authenticate to Kubernetes API server
|   +-- List/steal secrets across namespace (if RBAC permits)
|   +-- Create new privileged pods on other nodes
|   +-- Exec into other pods
|   +-- Modify deployments, inject backdoors
|
+-- 4. Network lateral movement
|   +-- Scan cluster network (all ClusterIP services)
|   +-- Reach NodePort services on all nodes
|   +-- Exfiltrate data via DNS tunneling or egress
|
+-- 5. Manipulate agent responses
|   +-- Spoof OOM events -> trigger pod evictions
|   +-- Poison metrics -> corrupt monitoring
|   +-- Fake iptables output -> misrepresent security posture
|
+-- 6. Cross-container attack via shared virtiofs
|   +-- Read other containers' filesystems
|   +-- Modify other containers' binaries
|   +-- Plant symlinks for host-side TOCTOU attacks
|
+-- 7. Hypervisor escape attempts
    +-- Craft malicious virtio descriptors
    +-- Exploit QEMU device emulation CVEs
    +-- DMA attack via VFIO (if IOMMU disabled)

8.8 Guest-to-Host Threat Matrix

Vector	Severity	Pre-Condition	Impact
K8s API access via SA token + pod network	CRITICAL	Root in guest + SA token exists	Full cluster compromise (RBAC-dependent)
VFIO DMA without IOMMU	CRITICAL	IOMMU disabled + VFIO device	Arbitrary host memory read/write
Stream read length mismatch	HIGH	Compromised agent	Data corruption in container logs/streams, `kubectl logs` output manipulation
No TTRPC response size limit	HIGH	Compromised agent	Shim memory exhaustion, node destabilization
QEMU virtio device exploit	HIGH	Root in guest + unpatched QEMU	Host code execution
Cross-container virtiofs access	HIGH	Root in guest	Data theft, binary tampering
Cluster network lateral movement	HIGH	Root in guest + no egress NetworkPolicy	Service scanning, data exfiltration
OOM event spoofing to Kubernetes	MEDIUM	Root in guest (cgroup manipulation)	Pod eviction DoS, host file creation
Volume stats response fabrication	MEDIUM	Compromised agent	Kubelet eviction decisions based on false data
GuestDetails version/feature spoofing	MEDIUM	Compromised agent	Shim logic manipulation, security feature bypass
Error message injection	MEDIUM	Compromised agent	Log poisoning, HTTP response injection
Metrics/iptables response poisoning	MEDIUM	Compromised agent	Monitoring corruption
Exit code manipulation	MEDIUM	Compromised agent	Restart policy bypass
Virtiofs symlink TOCTOU	MEDIUM	Root in guest + timing	Potential host file access
ConfigMap/Secret update race	MEDIUM	Root in guest + timing	Token/secret interception
Interface/route response divergence	LOW	Compromised agent	Stale shim network state, resource leaks
Stats JSON DoS	LOW	Compromised agent	Shim memory exhaustion

8.9 Guest-to-Host Recommendations

Minimize SA token exposure -- use automountServiceAccountToken: false on pods unless strictly needed; use projected volume tokens with short TTL and audience binding.
Enforce egress NetworkPolicy -- restrict guest pod network access to only required services; block API server access unless explicitly needed.
Enable IOMMU/vIOMMU -- always enable when using VFIO device passthrough.
Validate agent response data -- add container ID validation in watchOOMEvents() against known container set; sanitize metrics strings; validate iptables output format.
Per-container virtiofs isolation -- consider separate virtiofsd instances per container or use mount namespaces within the guest to prevent cross-container access.
Harden QEMU attack surface -- use machine type with minimal device set; enable sandboxing (seccomp, AppArmor for QEMU process); keep QEMU patched.
Rate-limit agent responses -- add throttling on OOM events and metrics to prevent DoS amplification.
Restrict virtiofsd -- run with --sandbox mode, minimize shared directory scope, consider read-only shares where possible.
Use read-only rootfs -- set readOnlyRootFilesystem: true in SecurityContext to limit guest filesystem writes.
Restrict guest capabilities -- never run containers with privileged: true in Kata VMs; drop all unnecessary capabilities.
Fix stream read length mismatch -- in readProcessStream() (kata_agent.go:2514), return min(len(resp.Data), len(data)) instead of len(resp.Data) to match actual bytes copied.
Set TTRPC max response size -- configure ttrpc.NewClient() with an explicit MaxRecvMsgSize option (e.g., 1MB) to limit memory consumption from malicious responses.
Sanitize error messages -- escape or truncate agent error strings before writing to HTTP responses in shim_management.go to prevent log/response injection.
Validate GuestDetails responses -- sanity-check MemBlockSizeBytes, version strings, and feature lists against expected ranges before using them in shim logic.

Key Source Files Reference

Component	File Path	Purpose
Proto definitions	`src/libs/protocols/protos/agent.proto`	All RPC method and message definitions
Agent RPC handlers	`src/agent/src/rpc.rs`	All agent-side RPC implementations
Agent policy gate	`src/agent/src/policy.rs`	Optional authorization enforcement
Agent policy engine	`src/agent/policy/src/policy.rs`	Regorus OPA engine, `allow_request()`, `set_policy()`
Genpolicy rules	`src/tools/genpolicy/rules.rego`	Production Rego policy with deep request validation
Allow-all policy	`src/kata-opa/allow-all.rego`	Permissive policy (all RPCs allowed)
Allow-set-policy	`src/kata-opa/allow-set-policy.rego`	Bootstrap policy (only SetPolicy allowed)
Agent device handling	`src/agent/src/device/mod.rs`	Device add/CDI processing
Agent VFIO handler	`src/agent/src/device/vfio_device_handler.rs`	VFIO PCI/AP device passthrough
Agent storage/mounts	`src/agent/src/storage/mod.rs`	Mount and storage operations
Agent mount primitives	`src/agent/src/mount.rs`	Low-level mount calls
Agent OOM monitor	`src/agent/src/sandbox.rs`	OOM cgroup event monitoring
Shim TTRPC client	`src/runtime/virtcontainers/pkg/agent/protocols/client/client.go`	Transport, dial, connection
Shim kata-agent glue	`src/runtime/virtcontainers/kata_agent.go`	OCI spec constraining, RPC dispatch
Shim OOM handler	`src/runtime/pkg/containerd-shim-v2/wait.go`	OOM event consumption, file creation
Shim management API	`src/runtime/pkg/containerd-shim-v2/shim_management.go`	Metrics, iptables HTTP endpoints
Shim container/VFIO	`src/runtime/virtcontainers/container.go`	VFIO annotation, CDI metadata
Network setup	`src/runtime/virtcontainers/network_linux.go`	TC filter, TAP, veth, namespace
Veth endpoints	`src/runtime/virtcontainers/veth_endpoint.go`	Virtual ethernet pair management
Filesystem sharing	`src/runtime/virtcontainers/fs_share_linux.go`	Virtiofs/9p share, bind mounts, watchers
Virtiofsd daemon	`src/runtime/virtcontainers/virtiofsd.go`	Host-side virtiofs daemon management
QEMU device config	`src/runtime/virtcontainers/qemu.go`	Hypervisor device setup, IOMMU config
Hypervisor socket	`src/runtime/virtcontainers/hypervisor.go`	VSock port constant
Hypervisor socket gen	`src/runtime/virtcontainers/hypervisor_linux.go`	CID generation

bpradipt/kata-shim-agent-threat-analysis.md