Summary
When receiving a Telegram voice note (OGG/ Opus), OpenClaw sometimes misclassifies the audio bytes as delimited text (CSV/ TSV) and injects a huge <file ...> block into the message body containing garbled binary-looking text. This bloats context, clutters logs/chat, and can slow replies.
What I see
Telegram message shows something like:
media:audio <file name="file_xxx.ogg" mime="text/tab-separated-values">
ζε§Θ ... (tons of garbage)
</file>
Impact
β’ Large garbage content appended to user message/context
Increased latency / slower responses
β’ Chat transcript pollution
-
Enable Telegram channel
-
Send a short voice note to the bot
-
Observe message body sometimes contains a <file ... mime="text/csv"> or text/ tab-separated-values block with garbage Suspected cause
In media-understanding/apply.js, file-block extraction heuristics can treat binary as UTF-8-ish text (looksLikeUtf8Text) and then guessDelimitedMime() may return text/csv or text/tab-separated-values based on commas/ tabs in the first decoded "line". Binary OGG/ Opus can pass the "printable" heuristic by chance.
Also: config allowedMimes may be bypassed because the code adds guessed mime types into the allowlist for that attachment.