#I'm thinking of buying a Mac Studio M1
1 messages Ā· Page 1 of 1 (latest)
The problem is that most of the local LLMs are:
- Prone to injection attacks, so risky if they evaluate content
- Not capable of using tools - so they can't run processes on the local machine
I had the same idea and had to re-think it.
Here's what I am doing:
Multi-Model Routing for OpenClaw
Match model to task. Don't run Opus for weather checks.
Providers
⢠Anthropic ā https://console.anthropic.com ā Sonnet ($3/$15/M), Opus ($15/$75/M)
⢠NVIDIA NIM ā https://build.nvidia.com ā Kimi K2 Thinking, Kimi K2.5. Free tier, 256K context
⢠Google Gemini ā https://aistudio.google.com ā Gemini 2.0 Flash. Free tier, 1M context
⢠Ollama (local) ā https://ollama.com ā Run open models on your own hardware. Zero cost
Routing
⢠Main session: kimi-think (free). Override to sonnet/opus when complexity demands it
⢠Sub-agents: gemini for web search, grok for X/Twitter, coder (Ollama Qwen 2.5 Coder 32B) for code
⢠Cron/scheduled: gemini ā don't burn paid tokens on routine jobs
⢠Specialized bots: gemini with restricted tool access
Fallback Chain
Kimi K2 Thinking (free) ā Sonnet (paid) ā Ollama local (free) ā Opus (expensive safety net)
Put free models in the middle. Most expensive model LAST.
Avoid
⢠Small local models (<14B) as primary ā tool calling breaks
⢠Expensive models for cron ā adds up fast at 6x/day
⢠One model for everything
Cost
⢠$0/mo: Kimi + Gemini + Ollama. Genuinely viable
⢠$5-20/mo: Add Sonnet. Covers 95% of tasks
⢠$50+/mo: Opus for heavy lifting. Max capability
how does your openclaw understands that it needs to use different modell ? my openclaw doesnt get that or gateway doesnt allow flexibility...