Hardware: Beelink GTR9 Pro AMD AI MAX 395+ , 128 GB Unified
Software: Lemonade Server with llamma.cpp rocm backend
Model: Qwen3.5 35B-A3B , 19.70GB
So am using a local LLM and I can keep chatting with a bot for a long time but the minute I give them a task that requires subagent spawn or reading files or web search they just eat up context so fast and get 'terminated' then I gotta message them again to resume the task even though session context isnt full yet.
Any solutions to do this correctly, or where bots can stream tool output slowly rather than dump lots of data at once?