Version: 2026.1.29
What happens:
Voice message transcription works, but the raw .ogg binary is ALSO included in context as text/plain:
Transcript: "Раз-раз проверка..." ✅
<file name="file.ogg" mime="text/plain"> 杏卧Ȁ 䒵䥈 햘땦ጁ... (garbage) </file>
Expected:
After successful transcription, original audio should NOT be injected into context.
Impact:
• Wastes 50-80% of context on binary garbage
• Forces frequent compactions
Config:
"tools.media.audio": { "enabled": true, "language": "ru", "models": [{"provider": "groq", "model": "whisper-large-v3-turbo"}] }
Channel: Telegram