NemoClaw on DGX A100 — Session Notes

Machine


GPUs	8× NVIDIA A100-SXM4-80GB (640 GB total VRAM)
CUDA	12.4
Driver	535.161.08
OS	Ubuntu 22.04
Docker	24.0.7 (user in `docker` group, no sudo needed)
NVIDIA container runtime	Confirmed working ✓
vLLM docker image	`vllm/vllm-openai:latest` (v0.17.1)

What We're Setting Up

NemoClaw = OpenClaw AI coding agent running inside an OpenShell sandbox, backed by local vLLM inference.

Goal: A fully local (no cloud inference) agent loop where:

The LLM (nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16) runs on GPU 0 via a vLLM Docker container
All agent network egress is intercepted and requires explicit approval via the OpenShell TUI
Model fits on a single A100-80G (59 GB BF16 weights, 11.6 GB KV cache at 131k context)

Quick Start (after all issues below are fixed)

cd /raid/vjawa/nemo_claw_test/openshell-openclaw-plugin

# 1. Start vLLM (waits until healthy — ~80s on first load)
./scripts/start-vllm.sh

# 2. Ensure OpenShell gateway is running
openshell status   # should say "Connected"
# if not: openshell gateway start --name nemoclaw

# 3. Launch the agent walkthrough (no API key needed)
./scripts/walkthrough.sh

In the right tmux pane, press Up and edit the prompt:

openclaw agent --agent main --local --session-id live -m "Fetch the current NVIDIA stock price"

Left pane shows the OpenShell TUI — approve/deny each outbound network request.

Issues Encountered & Fixes

1. NIM API key ≠ NGC API key (401 Unauthorized)

Problem: NGC_API_KEY from ~/.ngc/config is a Docker registry credential for nvcr.io pulls. It is rejected by integrate.api.nvidia.com with HTTP 401. A separate nvapi-* key from build.nvidia.com is required for NIM inference.

→ Fix: Use local vLLM instead. No inference API key needed.

2. OpenShell gateway must start before inference configuration

Problem:

Error: × No active gateway.

openshell inference set fails if called before the gateway is up.

→ Fix: Start gateway first (takes ~30s):

openshell gateway start --name nemoclaw
# Verify: openshell status → "Connected"

3. vLLM provider base URL must use `host.docker.internal`

Problem: The OpenShell gateway runs inside a Docker/k3s pod. Using localhost:8000 as the provider URL resolves to the pod's loopback — not the host — so inference calls never reach vLLM.

→ Fix:

openshell provider create \
  --name vllm-local \
  --type openai \
  --credential "OPENAI_API_KEY=dummy" \
  --config "OPENAI_BASE_URL=http://host.docker.internal:8000/v1"

walkthrough.sh now runs this automatically whenever a local vLLM is detected.

4. `openshell sandbox connect` does not support command passthrough

Problem: openshell sandbox connect <name> -- bash -c '...' rejects any argument after the sandbox name:

error: unexpected argument 'bash' found

→ Fix: Generate SSH config and use ssh -t with a pre-uploaded startup script:

openshell sandbox ssh-config "$SANDBOX_NAME" > /tmp/ssh.cfg
ssh -F /tmp/ssh.cfg "openshell-${SANDBOX_NAME}" 'bash /tmp/startup.sh'

walkthrough.sh does this automatically.

5. Sandbox named `nvapi-placeholder` not `nemoclaw`

Problem: nemoclaw onboard creates the sandbox with a name derived from the API key placeholder. The original walkthrough.sh hardcoded --name nemoclaw, so the right pane immediately exited.

→ Fix: Auto-detect the first Ready sandbox:

SANDBOX_NAME=$(openshell sandbox list 2>/dev/null \
  | sed 's/\x1b\[[0-9;]*m//g' \
  | awk 'NR>1 && $NF=="Ready" { print $1; exit }')

Pass an explicit name as argument to override: ./scripts/walkthrough.sh my-sandbox.

6. Inference model ID set to `vllm-local` instead of the real model name

Problem: After onboarding, openshell inference get showed Model: vllm-local. The gateway forwarded requests with that string as the model ID, causing:

404 The model 'vllm-local' does not exist

→ Fix: walkthrough.sh now detects the actual model name from /v1/models and passes it to openshell inference set:

VLLM_MODEL=$(curl -sf http://localhost:8000/v1/models \
  | python3 -c "import json,sys; d=json.load(sys.stdin); print(d['data'][0]['id'])")
openshell inference set --no-verify --provider vllm-local --model "$VLLM_MODEL"

7. vLLM tool-call parser: `llama3_json` is wrong for Nemotron

Problem (part 1): Without --enable-auto-tool-choice, vLLM rejects all tool-use requests:

400 "auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set

Problem (part 2): Nemotron uses a custom XML parameter format for tool calls — not the llama3_json or hermes JSON formats:

<tool_call>
<function=tool_name>
<parameter=arg1>
value
</parameter>
</function>
</tool_call>

Both llama3_json and hermes expect JSON inside <tool_call> and silently drop Nemotron's tool calls.

→ Fix: Custom nemotron_tool_parser.py that parses the XML parameter format. Also patch vllm/tool_parsers/__init__.py to register it (validation happens before import):

docker run -d \
  --gpus '"device=0"' \
  -p 8000:8000 \
  --shm-size 16g \
  -v "${MODEL_PARENT}:/model-parent:ro" \
  -v "scripts/nemotron_tool_parser.py:/usr/local/.../vllm/tool_parsers/nemotron_tool_parser.py:ro" \
  -v "scripts/vllm_tool_parsers_init.py:/usr/local/.../vllm/tool_parsers/__init__.py:ro" \
  vllm/vllm-openai:latest \
    --model "/model-parent/snapshots/${SNAPSHOT}" \
    --served-model-name "nvidia/nemotron-3-nano-30b-a3b" \
    --enable-auto-tool-choice \
    --tool-call-parser nemotron \
    --trust-remote-code \
    --max-model-len 131072

Shortcut: ./scripts/start-vllm.sh handles this.

8. vLLM not installed on host — must use Docker

Problem: which vllm → not found.

→ Fix: Use the pre-pulled vllm/vllm-openai:latest Docker image. start-vllm.sh wraps the docker run.

9. HF snapshot symlinks break when mounting the snapshot dir alone

Problem: HF cache snapshots contain symlinks like config.json -> ../../blobs/.... Mounting only the snapshot dir means Docker can't resolve ../../blobs/, causing:

Invalid repository ID or local directory: '/model'

→ Fix: Mount the parent model directory (which contains both snapshots/ and blobs/) and point vLLM at the snapshot inside it:

-v "/raid/praateekm/hf_cache/hub/models--nvidia--...:/model-parent:ro"
# then: --model /model-parent/snapshots/<hash>

10. Context overflow: `--max-model-len 32768` too small for agent sessions

Problem: OpenClaw agent sessions accumulate tool call history. With --max-model-len 32768 and max_tokens=4096, only 28,672 input tokens are usable. A medium-length coding session hits this after a few turns:

400 You passed 28673 input tokens and requested 4096 output tokens.
However, the model's context length is only 32768 tokens.

The GPU KV cache was only 0.5% utilized at 32768 — there was ample VRAM headroom.

→ Fix: Use --max-model-len 131072 (128K context). This gives:

11.6 GB KV cache
404,528 total cached tokens
14.45× max concurrency at 131K tokens per request

--max-model-len 131072

start-vllm.sh now defaults to 131072.

Files Added / Modified

File	Description
`scripts/start-vllm.sh`	Start/stop/status for vLLM Docker container (defaults: Nemotron Nano 30B, 131k context, GPU 0)
`scripts/nemotron_tool_parser.py`	Custom vLLM tool parser for Nemotron XML tool call format
`scripts/vllm_tool_parsers_init.py`	Patched vLLM `__init__.py` that registers the `nemotron` parser name
`scripts/walkthrough.sh`	Fixed: auto-configures openshell provider, detects sandbox + model name, no API key required
`scripts/walkthrough-nim.sh`	Variant: NIM cloud/local endpoint instead of vLLM

Model weights location

/raid/praateekm/hf_cache/hub/models--nvidia--NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/
  snapshots/378df16e4b54901a3f514f38ea9a34db9d061634/   # 59 GB BF16

VibhuJawa/nemoclaw-session-notes.md

Select an option

No results found

Select an option

No results found

NemoClaw on DGX A100 — Session Notes

Machine

What We're Setting Up

Quick Start (after all issues below are fixed)

Issues Encountered & Fixes

1. NIM API key ≠ NGC API key (401 Unauthorized)

2. OpenShell gateway must start before inference configuration

3. vLLM provider base URL must use `host.docker.internal`

4. `openshell sandbox connect` does not support command passthrough

5. Sandbox named `nvapi-placeholder` not `nemoclaw`

6. Inference model ID set to `vllm-local` instead of the real model name

7. vLLM tool-call parser: `llama3_json` is wrong for Nemotron

8. vLLM not installed on host — must use Docker

9. HF snapshot symlinks break when mounting the snapshot dir alone

10. Context overflow: `--max-model-len 32768` too small for agent sessions

Files Added / Modified

Model weights location

VibhuJawa/nemoclaw-session-notes.md

NemoClaw on DGX A100 — Session Notes

Machine

What We're Setting Up

Quick Start (after all issues below are fixed)

Issues Encountered & Fixes

1. NIM API key ≠ NGC API key (401 Unauthorized)

2. OpenShell gateway must start before inference configuration

3. vLLM provider base URL must use host.docker.internal

4. openshell sandbox connect does not support command passthrough

5. Sandbox named nvapi-placeholder not nemoclaw

6. Inference model ID set to vllm-local instead of the real model name

7. vLLM tool-call parser: llama3_json is wrong for Nemotron

8. vLLM not installed on host — must use Docker

9. HF snapshot symlinks break when mounting the snapshot dir alone

10. Context overflow: --max-model-len 32768 too small for agent sessions

Files Added / Modified

Model weights location

3. vLLM provider base URL must use `host.docker.internal`

4. `openshell sandbox connect` does not support command passthrough

5. Sandbox named `nvapi-placeholder` not `nemoclaw`

6. Inference model ID set to `vllm-local` instead of the real model name

7. vLLM tool-call parser: `llama3_json` is wrong for Nemotron

10. Context overflow: `--max-model-len 32768` too small for agent sessions