Skip to content

Instantly share code, notes, and snippets.

@jatinkrmalik
Last active February 3, 2026 08:14
Show Gist options
  • Select an option

  • Save jatinkrmalik/75ed6d55214879286a1e2601c773c58e to your computer and use it in GitHub Desktop.

Select an option

Save jatinkrmalik/75ed6d55214879286a1e2601c773c58e to your computer and use it in GitHub Desktop.
OpenClaw Audio Binary Injection Fix // Prevents voice messages from being embedded as raw binary text (wastes tokens, breaks sessions). Works for Telegram, Discord, and all voice platforms. Temporary fix until PRs #3904/#4235 merge.

OpenClaw Voice Message Binary Injection Fix

Fixes voice messages being embedded as raw binary instead of just the transcription


Table of Contents


The Problem

When users send voice messages through Telegram, Discord, or other platforms, OpenClaw was incorrectly processing the audio files:

  1. Voice messages were being treated as text files — specifically text/tab-separated-values
  2. Raw audio binary was embedded into the chat context — this could be 500-8000+ tokens per message
  3. Sessions would hit the token limit — after just 3-5 voice messages, sessions would hit 200K+ tokens
  4. Bot would stop responding — once the token limit was reached, the bot couldn't process new messages

Symptoms

  • Bot stops responding to voice-only messages
  • Session token count explodes to 200K+ with just a few voice notes
  • Logs show <file name="...ogg" mime="text/tab-separated-values">
  • Voice messages work when downloaded separately but fail through the bot

Impact Scale

Voice Length Wasted Tokens Effect
10-15 seconds 500-2,000 Annoying
30-60 seconds 2,000-8,000+ Session becomes unusable
2+ minutes 15,000-50,000+ Immediate crash

Root Cause

Why did this happen?

OGG audio files (used by Telegram voice messages as OGG/Opus) have ASCII-heavy headers:

  1. OGG header starts with "OggS" — all printable ASCII characters
  2. Vorbis/Opus metadata contains extensive ASCII text — fields like ENCODER=, TITLE=, etc.
  3. Metadata uses tab characters — the guessDelimitedMime() function sees tabs and classifies it as TSV

The file type detection in OpenClaw works like this:

1. looksLikeUtf8Text() checks if >85% of bytes are printable ASCII
   → OGG files PASS this check (ASCII-heavy headers)

2. guessDelimitedMime() looks for tabs and commas
   → OGG metadata has tabs, so it's classified as TSV

3. File gets embedded as <file name="...ogg" mime="text/tab-separated-values">
   → Raw binary floods the context

Why existing checks weren't enough

OpenClaw had a kind === "audio" check, but this only worked when:

  • The file was correctly identified as audio BEFORE buffer reading
  • The MIME type was properly set by the sender

Telegram doesn't always send proper MIME types for voice messages, so the files would slip through the initial detection and get reclassified as text later in the pipeline.


The Fix

This patch adds three layers of defense to prevent audio files from being embedded as text:

Layer 1: Kind Skip (Early Exit)

if (!forcedTextMime && (kind === "image" || kind === "audio" || kind === "video")) {
    continue;
}

Files identified as audio, image, or video skip text extraction entirely — no buffer reading needed.

Layer 2: Extension Skip (Defense-in-Depth)

const _patchAudioExts = new Set(['.ogg', '.opus', '.mp3', '.wav', '.aac', '.flac', '.m4a', '.oga', '.webm']);
const _patchNameExt = nameHint ? path.extname(nameHint).toLowerCase() : '';
if (_patchAudioExts.has(_patchNameExt)) {
    continue;
}

Even if kind detection fails, files with audio extensions are skipped. This catches edge cases where the attachment type is misidentified.

Layer 3: Magic Byte Detection (Final Safety Net)

function hasBinaryAudioMagic(buffer) {
    // OGG container: "OggS" signature
    if (buffer.length >= 4 &&
        buffer[0] === 0x4F && // O
        buffer[1] === 0x67 && // g
        buffer[2] === 0x67 && // g
        buffer[3] === 0x53) { // S
        return true;
    }
    // MP3 with ID3v2: "ID3" signature
    if (buffer[0] === 0x49 && // I
        buffer[1] === 0x44 && // D
        buffer[2] === 0x33) { // 3
        return true;
    }
    return false;
}

Checks the actual file bytes for OGG and MP3 magic signatures. This prevents misclassification even if a file has no extension or wrong MIME type.

What about transcription?

Voice transcriptions still work perfectly! This patch only prevents the raw audio binary from being embedded. The transcription is handled separately by the audio capability pipeline and appears in the conversation as expected.


Installation

Find your OpenClaw installation

macOS (Homebrew):

/opt/homebrew/lib/node_modules/openclaw/

Linux (nvm):

~/.nvm/versions/node/v22.22.0/lib/node_modules/openclaw/

Linux (npm global):

/usr/lib/node_modules/openclaw/

Windows:

%APPDATA%\npm\node_modules\openclaw\

Apply the patch

Option 1: Automated (recommended)

  1. Download the fix-audio-binary.patch and the optional-apply.sh files in a dir.
  2. Open your CLI/Terminal and cd to that dir.
  3. Run bash optional-apply.sh
  4. It should automatically apply the patch for you.

Option 2: Manual

  1. Download the patch file
  2. Run: patch -p1 < openclaw-voice-message-fix.patch /path/to/openclaw/dist/media-understanding/apply.js
  3. Restart: openclaw gateway restart

Verification

After applying the patch:

  1. Send a voice message to your bot
  2. Check that the bot responds with the transcription
  3. Verify no binary in context — you should NOT see <file name="...ogg" mime="text/tab-separated-values"> in session files
  4. Check token count stays reasonable — a 30-second voice note should add ~500-1000 tokens for the transcription, not 8000+

Expected behavior

Before Fix After Fix
Raw binary embedded Only transcription text
500-8000+ tokens per voice ~500-1000 tokens (transcription only)
Session crashes after 3-5 voices Session handles many voices normally

Credits

This fix is based on work by the OpenClaw community:

@hclsys — Original magic byte detection approach

  • PR #3904: fix(media): detect binary audio by magic bytes to prevent text misidentification
  • Identified the root cause and implemented the OGG/MP3 magic byte detection

@null-runner — Extension-based skip and defense-in-depth

  • PR #4235: fix(media): skip audio in extractFileBlocks + hasBinaryAudioMagic defense-in-depth
  • Combined both early skip (by kind) and magic byte checks for comprehensive coverage

Community Contributors

Special thanks to everyone who reported, tested, and provided logs:


Related Issues & PRs

  • #1989 — Bug: Audio-only messages on Telegram not triggering agent response
  • #4197 — Feature: Strip raw audio attachment after successful transcription
  • #3904 — fix(media): detect binary audio by magic bytes
  • #4235 — fix(media): skip audio in extractFileBlocks + defense-in-depth

Notes

  • This is a temporary patch until the official PRs are merged
  • The patch will be overwritten on npm update — reapply after updates
  • Works for Telegram, Discord, and all voice message platforms
  • Compatible with OpenClaw v2026.1.30 and earlier

License

This patch is provided as a temporary fix for the OpenClaw project. The original code and fixes are under the OpenClaw project license.


Last updated: February 2026

# =====================================
# OpenClaw Audio Binary Injection Fix
# =====================================
#
# WHAT THIS FIXES:
# ----------------
# Voice messages (especially from Telegram) were being treated as text files
# instead of audio. This caused the raw audio binary to be dumped into the
# chat context, wasting thousands of tokens and breaking conversations.
#
# SYMPTOMS:
# - Bot stops responding after a few voice messages
# - Session token count explodes to 200K+ with just a few voice notes
# - Voice messages show as "text/tab-separated-values" in logs
#
# THE FIX:
# --------
# Skips audio files from being embedded as text. Only the transcription
# (the converted text) goes to the AI, not the raw audio file.
#
# APPLICABLE TO:
# --------------
# - OpenClaw v2026.1.30 and earlier
# - Affects: Telegram voice messages, Discord audio, all voice notes
#
# ============================================================================
# INSTALLATION INSTRUCTIONS
# ============================================================================
#
# STEP 1: Find your OpenClaw installation directory
#
# macOS (Homebrew):
# /opt/homebrew/lib/node_modules/openclaw/
#
# Linux (nvm):
# ~/.nvm/versions/node/v22.22.0/lib/node_modules/openclaw/
#
# Linux (npm global):
# /usr/lib/node_modules/openclaw/
#
# Windows:
# %APPDATA%\npm\node_modules\openclaw\
#
# STEP 2: Apply this patch
#
# From this directory, run:
#
# patch -p1 < fix-audio-binary.patch /path/to/openclaw/dist/media-understanding/apply.js
#
# Or edit the file manually:
# - Open: dist/media-understanding/apply.js
# - Apply the changes shown in the "Patch" section below
#
# STEP 3: Restart OpenClaw
#
# openclaw gateway restart
#
# VERIFICATION:
# Send a voice note. The bot should respond normally, and you should NOT
# see <file name="...ogg" mime="text/tab-separated-values"> in the session.
#
# ============================================================================
# PATCH START - Changes to dist/media-understanding/apply.js
# ============================================================================
--- a/dist/media-understanding/apply.js
+++ b/dist/media-understanding/apply.js
@@ -97,6 +97,27 @@ function resolveUtf16Charset(buffer) {
}
return undefined;
}
+
+/**
+ * Detects binary audio files by their magic bytes (file signatures).
+ * This prevents OGG/Opus and MP3 files from being misidentified as text
+ * due to their ASCII-heavy headers passing the utf8 text check.
+ *
+ * OGG container: "OggS" (RFC 3533 Section 6)
+ * MP3 with ID3v2: "ID3" (id3.org spec Section 3.1)
+ */
+function hasBinaryAudioMagic(buffer) {
+ if (!buffer || buffer.length < 3) {
+ return false;
+ }
+ // OGG container: "OggS" signature (4 bytes)
+ if (buffer.length >= 4 &&
+ buffer[0] === 0x4F && // O
+ buffer[1] === 0x67 && // g
+ buffer[2] === 0x67 && // g
+ buffer[3] === 0x53) { // S
+ return true;
+ }
+ // MP3 with ID3v2: "ID3" signature (3 bytes)
+ if (buffer[0] === 0x49 && // I
+ buffer[1] === 0x44 && // D
+ buffer[2] === 0x33) { // 3
+ return true;
+ }
+ return false;
+}
+
function looksLikeUtf8Text(buffer) {
if (!buffer || buffer.length === 0) {
return false;
@@ -177,7 +198,7 @@ async function extractFileBlocks(params) {
}
const forcedTextMime = resolveTextMimeFromName(attachment.path ?? attachment.url ?? "");
const kind = forcedTextMime ? "document" : resolveAttachmentKind(attachment);
- if (!forcedTextMime && (kind === "image" || kind === "video")) {
+ if (!forcedTextMime && (kind === "image" || kind === "audio" || kind === "video")) {
continue;
}
if (!limits.allowUrl && attachment.url && !attachment.path) {
@@ -199,6 +220,17 @@ async function extractFileBlocks(params) {
continue;
}
const nameHint = bufferResult?.fileName ?? attachment.path ?? attachment.url;
+
+ // PATCH: Skip audio files by extension (defense-in-depth)
+ // Even if kind detection fails, audio extensions should be skipped
+ // to prevent OGG/Opus binary from being embedded as text.
+ const _patchAudioExts = new Set(['.ogg', '.opus', '.mp3', '.wav', '.aac', '.flac', '.m4a', '.oga', '.webm']);
+ const _patchNameExt = nameHint ? path.extname(nameHint).toLowerCase() : '';
+ if (_patchAudioExts.has(_patchNameExt)) {
+ continue;
+ }
+
const forcedTextMimeResolved = forcedTextMime ?? resolveTextMimeFromName(nameHint ?? "");
const utf16Charset = resolveUtf16Charset(bufferResult?.buffer);
const textSample = decodeTextSample(bufferResult?.buffer);
- const textLike = Boolean(utf16Charset) || looksLikeUtf8Text(bufferResult?.buffer);
+ const textLike = (Boolean(utf16Charset) || looksLikeUtf8Text(bufferResult?.buffer)) &&
+ !hasBinaryAudioMagic(bufferResult?.buffer);
if (!forcedTextMimeResolved && kind === "audio" && !textLike) {
continue;
}
# ============================================================================
# END OF PATCH
# ============================================================================
#
# ADDITIONAL NOTES:
# -----------------
# - This patch is temporary and will be overwritten when you update OpenClaw
# - Re-apply after updates until the official fix is released
# - Tracking issues: #1989, #4197 | PRs: #3904, #4235
# - Works for Telegram, Discord, and all voice message platforms
#
# ============================================================================
#!/bin/bash
# OpenClaw Audio Binary Fix - Auto-apply script
# Run this script to automatically patch your OpenClaw installation
# This is optional as you can also just follow the comments above in the patch file.
set -e
echo "======================================"
echo " OpenClaw Audio Binary Injection Fix"
echo "======================================="
echo ""
# Find OpenClaw installation
OPENCLAW_PATH=""
# Check common paths
PATHS=(
"$HOME/.nvm/versions/node"/v*/lib/node_modules/openclaw
/opt/homebrew/lib/node_modules/openclaw
/usr/local/lib/node_modules/openclaw
/usr/lib/node_modules/openclaw
)
for path in "${PATHS[@]}"; do
expanded=$(echo $path 2>/dev/null | head -1)
if [ -d "$expanded" ]; then
OPENCLAW_PATH="$expanded"
break
fi
done
if [ -z "$OPENCLAW_PATH" ]; then
echo "❌ Could not find OpenClaw installation."
echo ""
echo "Please specify the path manually:"
echo " bash apply.sh /path/to/openclaw"
exit 1
fi
echo "✓ Found OpenClaw at: $OPENCLAW_PATH"
echo ""
TARGET_FILE="$OPENCLAW_PATH/dist/media-understanding/apply.js"
if [ ! -f "$TARGET_FILE" ]; then
echo "❌ Could not find apply.js at expected location:"
echo " $TARGET_FILE"
exit 1
fi
# Backup original file
BACKUP_FILE="$TARGET_FILE.backup-$(date +%Y%m%d-%H%M%S)"
cp "$TARGET_FILE" "$BACKUP_FILE"
echo "✓ Backed up original to: $BACKUP_FILE"
echo ""
# Apply patch
echo "Applying patch..."
patch -p1 < fix-audio-binary.patch "$TARGET_FILE"
echo ""
echo "✓ Patch applied successfully!"
echo ""
echo "Restarting OpenClaw gateway..."
if command -v openclaw &> /dev/null; then
openclaw gateway restart
elif systemctl --user is-active --quiet openclaw-gateway; then
systemctl --user restart openclaw-gateway
else
echo "⚠️ Please restart OpenClaw manually:"
echo " openclaw gateway restart"
fi
echo ""
echo "✓ Done! Send a voice note to test."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment