DeepSeek & Auxiliary models | Nous Research | Page 1

gentle kernel May 3, 2026, 3:38 PM

#

current setup is ultra fragile. What do you mean?

So we have your Hermes model set as DeepSeek. We are not replacing that. There are also AUXILIARY tasks to do things like selecting tools, compressing memory, all of the "behind-the-scenes" stuff.
ISSUES WITH THIS

DeepSeek is not multimodal. Vision tasks will fail
You are using your paid DeepSeek tokens on these hidden tasks
DeepSeek seems to be building a bit of context per your graphs.

so WITHOUT replacing DeepSeek as your main model, we can set a free or cheaper multimodal model as your Auxiliary model for all of your Aux task; solving the above issues in this way:

StepFun 3.5 Flash or Qwen 3.5 Plus are both Vision capable
StepFun 3.5 Flash is free and Qwen is ~$0.12 in
Gives memory compression and management to an entirely different model than DeepSeek; which could have an immediete affect on your bloat here.

this is how you set the Auxiliary tasks to a different, cheaper, and multimodal model:

hermes model
scroll to the very bottom of the list
select 'Set your Aux model' or wtvr, very bottom option
Every task will say auto , defauting to DeepSeek for aux tasks
Set StepFun 3.5 Flash for all aux tasks

Now DeepSeek is your main model, you stil its full intelligence, only now focused on just developing for you.
StepFun 3.5 Flash is now focused on supporting DeepSeek's memory, and not trying to develop. It will also makes DeepSeek a bunch cheaper 🤘🤖

winged turret May 3, 2026, 4:14 PM

#

oh yeah i made my auxiliary models all deepseek pro instead of flash for some reason i'll be trying step fun for this as per your recommendation, i also use mimo v2.5 flash for multimodal too. i never really quite talked with anybody and just ran in blindly that's why i'm facing all these issues now. i was really thinking about reverting back to v0.11.0 tonight i'm just waiting for my claude code to be back online in 2 hours but I'm afraid i'm gonna break more stuff

concerning with the ultra fragile stuff - I practically made the agent build itself made some custom bridge routing to memory files, configs whatever to try an bypass the default protocols to save on token consumption, latency, and other vectors that i can try to make it improve, at one point it hit 90%+ on cache reads on average , was really fast finding files, replied really quickly, though it's a very messy setup but it really great for me I really squeezed out every last ounce of intelligence I can from the model (though just for research, for me and my wifes work, automation, scraping and some normal stuff) . Everything was really going well. And just like before when i updated to 0.11.0 my stupid ass saw something shiny and jumped right in and even though we tried to make sure that everything would be fine and would be transfered cleanly everything also broke and it also took me a full day of work just to get it working again but i never encountered the cost bloat issue that i'm having now with deepseek.

The cost explosion is just one of my problems though, I'm also facing a lot of degradation in terms of the output quality I'm getting. It's just not like before when I tell it to do something we don't go back and forth me trying to explain my intent to it, it just worked. Right now this is my full working diagram please don't mind my naming scheme. it's corny I know.

📎 message.txt

#

@gentle kernel

gentle kernel May 3, 2026, 4:21 PM

#

I think you're getting yourself too knotted up in metrics. You say it's a confusing, fragile, neigh incomprehensible agent structure that squeezes out every last drop of intelligence? I'm not doubting you, but this seems like the end result of maybe working harder than smarter.

Take a deep breath. Do it again. Start from your use case and work backwards.

winged turret May 3, 2026, 4:22 PM

#

Damn that's a lot of work boss but I get your point.

#

the reason why it's all bloated is also because of this stuff i did. i practically made it zero shot though. like i just tell it to do something and then bam great results everytime. It just wasn't compatible with any updates

gentle kernel May 3, 2026, 4:26 PM

#

try this:

set your Auxiliary model to StepFun 3.5 Flash to relieve extra context off of DeepSeek, and replace up to 85% of DeepSeek's token usage with free tokens.
use skills like /ideate and /plan to round out your concepts better. Have you ever heard of Rubber Ducking?
Devs used to keep a rubber duck on their desk, simply for the purpose of testing if they, themselves, the dev, can explain ther whole goal/project out loud to even a dummy; a rubber duck. IF they can't the idea or the implementation needs more cooking (/plan, /ideation)

#

https://en.wikipedia.org/wiki/Rubber_duck_debugging

Rubber duck debugging

Rubber duck debugging (or rubberducking) is a debugging technique in software engineering, wherein a programmer explains their code, step by step, in natural language—either aloud or in writing—to reveal mistakes and misunderstandings.
The name is a reference to a story in the book The Pragmatic Programmer. It tells a story of a developer wh...

winged turret May 3, 2026, 4:29 PM

#

i haven't heard of that at all, there's so much noise in my head i can't wrap it around a good plan. but i already changed the auxiliary model though. but i'm too tired to try it out now, what

#

what's exhausting for me now is the realization that I have to try shit the right/hard way. Anyway does it really need to have 30 turns of reasoning? 1 query jumps up to 300k tokens that's so far from normal it's pathological (my setup I mean)

gentle kernel May 3, 2026, 4:39 PM

#

yeah, so I would take what I said and maybe think about it.

#

clear the slate, start with your use case, develop a whole plan that you can "rubber duck" using /ideate and /plan to fill that in

#

Hermes-Agent, and all AI, will work better for ya when it's not trying to guess along with you
🤘 🤖

#

I'm gonna close this thread unless you have something else? I think you see the path now.

#DeepSeek & Auxiliary models