Skip to content

Instantly share code, notes, and snippets.

@VibhuJawa
Last active March 16, 2026 08:09
Show Gist options
  • Select an option

  • Save VibhuJawa/3e0ce51a0b44750a0c1300f9c16ee4a9 to your computer and use it in GitHub Desktop.

Select an option

Save VibhuJawa/3e0ce51a0b44750a0c1300f9c16ee4a9 to your computer and use it in GitHub Desktop.
NemoClaw on DGX A100 — Setup Session Notes

NemoClaw on DGX A100 — Session Notes

Machine

GPUs 8× NVIDIA A100-SXM4-80GB (640 GB total VRAM)
CUDA 12.4
Driver 535.161.08
OS Ubuntu 22.04
Docker 24.0.7 (user in docker group, no sudo needed)
NVIDIA container runtime Confirmed working ✓
vLLM docker image vllm/vllm-openai:latest (v0.17.1)

What We're Setting Up

NemoClaw = OpenClaw AI coding agent running inside an OpenShell sandbox, backed by local vLLM inference.

Goal: A fully local (no cloud inference) agent loop where:

  • The LLM (nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16) runs on GPU 0 via a vLLM Docker container
  • All agent network egress is intercepted and requires explicit approval via the OpenShell TUI
  • Model fits on a single A100-80G (59 GB BF16 weights, 11.6 GB KV cache at 131k context)

Quick Start (after all issues below are fixed)

cd /raid/vjawa/nemo_claw_test/openshell-openclaw-plugin

# 1. Start vLLM (waits until healthy — ~80s on first load)
./scripts/start-vllm.sh

# 2. Ensure OpenShell gateway is running
openshell status   # should say "Connected"
# if not: openshell gateway start --name nemoclaw

# 3. Launch the agent walkthrough (no API key needed)
./scripts/walkthrough.sh

In the right tmux pane, press Up and edit the prompt:

openclaw agent --agent main --local --session-id live -m "Fetch the current NVIDIA stock price"

Left pane shows the OpenShell TUI — approve/deny each outbound network request.


Issues Encountered & Fixes

1. NIM API key ≠ NGC API key (401 Unauthorized)

Problem: NGC_API_KEY from ~/.ngc/config is a Docker registry credential for nvcr.io pulls. It is rejected by integrate.api.nvidia.com with HTTP 401. A separate nvapi-* key from build.nvidia.com is required for NIM inference.

Fix: Use local vLLM instead. No inference API key needed.


2. OpenShell gateway must start before inference configuration

Problem:

Error: × No active gateway.

openshell inference set fails if called before the gateway is up.

Fix: Start gateway first (takes ~30s):

openshell gateway start --name nemoclaw
# Verify: openshell status → "Connected"

3. vLLM provider base URL must use host.docker.internal

Problem: The OpenShell gateway runs inside a Docker/k3s pod. Using localhost:8000 as the provider URL resolves to the pod's loopback — not the host — so inference calls never reach vLLM.

Fix:

openshell provider create \
  --name vllm-local \
  --type openai \
  --credential "OPENAI_API_KEY=dummy" \
  --config "OPENAI_BASE_URL=http://host.docker.internal:8000/v1"

walkthrough.sh now runs this automatically whenever a local vLLM is detected.


4. openshell sandbox connect does not support command passthrough

Problem: openshell sandbox connect <name> -- bash -c '...' rejects any argument after the sandbox name:

error: unexpected argument 'bash' found

Fix: Generate SSH config and use ssh -t with a pre-uploaded startup script:

openshell sandbox ssh-config "$SANDBOX_NAME" > /tmp/ssh.cfg
ssh -F /tmp/ssh.cfg "openshell-${SANDBOX_NAME}" 'bash /tmp/startup.sh'

walkthrough.sh does this automatically.


5. Sandbox named nvapi-placeholder not nemoclaw

Problem: nemoclaw onboard creates the sandbox with a name derived from the API key placeholder. The original walkthrough.sh hardcoded --name nemoclaw, so the right pane immediately exited.

Fix: Auto-detect the first Ready sandbox:

SANDBOX_NAME=$(openshell sandbox list 2>/dev/null \
  | sed 's/\x1b\[[0-9;]*m//g' \
  | awk 'NR>1 && $NF=="Ready" { print $1; exit }')

Pass an explicit name as argument to override: ./scripts/walkthrough.sh my-sandbox.


6. Inference model ID set to vllm-local instead of the real model name

Problem: After onboarding, openshell inference get showed Model: vllm-local. The gateway forwarded requests with that string as the model ID, causing:

404 The model 'vllm-local' does not exist

Fix: walkthrough.sh now detects the actual model name from /v1/models and passes it to openshell inference set:

VLLM_MODEL=$(curl -sf http://localhost:8000/v1/models \
  | python3 -c "import json,sys; d=json.load(sys.stdin); print(d['data'][0]['id'])")
openshell inference set --no-verify --provider vllm-local --model "$VLLM_MODEL"

7. vLLM tool-call parser: llama3_json is wrong for Nemotron

Problem (part 1): Without --enable-auto-tool-choice, vLLM rejects all tool-use requests:

400 "auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set

Problem (part 2): Nemotron uses a custom XML parameter format for tool calls — not the llama3_json or hermes JSON formats:

<tool_call>
<function=tool_name>
<parameter=arg1>
value
</parameter>
</function>
</tool_call>

Both llama3_json and hermes expect JSON inside <tool_call> and silently drop Nemotron's tool calls.

Fix: Custom nemotron_tool_parser.py that parses the XML parameter format. Also patch vllm/tool_parsers/__init__.py to register it (validation happens before import):

docker run -d \
  --gpus '"device=0"' \
  -p 8000:8000 \
  --shm-size 16g \
  -v "${MODEL_PARENT}:/model-parent:ro" \
  -v "scripts/nemotron_tool_parser.py:/usr/local/.../vllm/tool_parsers/nemotron_tool_parser.py:ro" \
  -v "scripts/vllm_tool_parsers_init.py:/usr/local/.../vllm/tool_parsers/__init__.py:ro" \
  vllm/vllm-openai:latest \
    --model "/model-parent/snapshots/${SNAPSHOT}" \
    --served-model-name "nvidia/nemotron-3-nano-30b-a3b" \
    --enable-auto-tool-choice \
    --tool-call-parser nemotron \
    --trust-remote-code \
    --max-model-len 131072

Shortcut: ./scripts/start-vllm.sh handles this.


8. vLLM not installed on host — must use Docker

Problem: which vllm → not found.

Fix: Use the pre-pulled vllm/vllm-openai:latest Docker image. start-vllm.sh wraps the docker run.


9. HF snapshot symlinks break when mounting the snapshot dir alone

Problem: HF cache snapshots contain symlinks like config.json -> ../../blobs/.... Mounting only the snapshot dir means Docker can't resolve ../../blobs/, causing:

Invalid repository ID or local directory: '/model'

Fix: Mount the parent model directory (which contains both snapshots/ and blobs/) and point vLLM at the snapshot inside it:

-v "/raid/praateekm/hf_cache/hub/models--nvidia--...:/model-parent:ro"
# then: --model /model-parent/snapshots/<hash>

10. Context overflow: --max-model-len 32768 too small for agent sessions

Problem: OpenClaw agent sessions accumulate tool call history. With --max-model-len 32768 and max_tokens=4096, only 28,672 input tokens are usable. A medium-length coding session hits this after a few turns:

400 You passed 28673 input tokens and requested 4096 output tokens.
However, the model's context length is only 32768 tokens.

The GPU KV cache was only 0.5% utilized at 32768 — there was ample VRAM headroom.

Fix: Use --max-model-len 131072 (128K context). This gives:

  • 11.6 GB KV cache
  • 404,528 total cached tokens
  • 14.45× max concurrency at 131K tokens per request
--max-model-len 131072

start-vllm.sh now defaults to 131072.


Files Added / Modified

File Description
scripts/start-vllm.sh Start/stop/status for vLLM Docker container (defaults: Nemotron Nano 30B, 131k context, GPU 0)
scripts/nemotron_tool_parser.py Custom vLLM tool parser for Nemotron XML tool call format
scripts/vllm_tool_parsers_init.py Patched vLLM __init__.py that registers the nemotron parser name
scripts/walkthrough.sh Fixed: auto-configures openshell provider, detects sandbox + model name, no API key required
scripts/walkthrough-nim.sh Variant: NIM cloud/local endpoint instead of vLLM

Model weights location

/raid/praateekm/hf_cache/hub/models--nvidia--NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/
  snapshots/378df16e4b54901a3f514f38ea9a34db9d061634/   # 59 GB BF16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment