#I'm thinking of buying a Mac Studio M1

1 messages Ā· Page 1 of 1 (latest)

woven scarab
#

The problem is that most of the local LLMs are:

  1. Prone to injection attacks, so risky if they evaluate content
  2. Not capable of using tools - so they can't run processes on the local machine

I had the same idea and had to re-think it.

#

Here's what I am doing:

Multi-Model Routing for OpenClaw

Match model to task. Don't run Opus for weather checks.

Providers

• Anthropic — https://console.anthropic.com — Sonnet ($3/$15/M), Opus ($15/$75/M)
• NVIDIA NIM — https://build.nvidia.com — Kimi K2 Thinking, Kimi K2.5. Free tier, 256K context
• Google Gemini — https://aistudio.google.com — Gemini 2.0 Flash. Free tier, 1M context
• Ollama (local) — https://ollama.com — Run open models on your own hardware. Zero cost
Routing

• Main session: kimi-think (free). Override to sonnet/opus when complexity demands it
• Sub-agents: gemini for web search, grok for X/Twitter, coder (Ollama Qwen 2.5 Coder 32B) for code
• Cron/scheduled: gemini — don't burn paid tokens on routine jobs
• Specialized bots: gemini with restricted tool access
Fallback Chain

Kimi K2 Thinking (free) → Sonnet (paid) → Ollama local (free) → Opus (expensive safety net)

Put free models in the middle. Most expensive model LAST.

Avoid

• Small local models (<14B) as primary — tool calling breaks
• Expensive models for cron — adds up fast at 6x/day
• One model for everything
Cost

• $0/mo: Kimi + Gemini + Ollama. Genuinely viable
• $5-20/mo: Add Sonnet. Covers 95% of tasks
• $50+/mo: Opus for heavy lifting. Max capability

cinder flax