#What's the best model setup as of today?

1 messages Ā· Page 1 of 1 (latest)

obtuse remnant
#

I've been using Sonnet and Opus for the most part but since they don't integrate via subscription anymore I'm exploring other models. What's the best setup overall? Been looking into OpenAI, Gemma, MiniMax, Kimi, Gemini, etc. @radiant axle

fathom epochBOT
radiant axleBOT
#

I’d keep it simple:

  1. Use a strong cloud model as your main agent
  2. Use a cheaper model for heartbeats, cron jobs, and low-stakes tasks
  3. Treat local models as experimental/privacy-focused, not your default, unless you’ve got serious hardware

That’s the setup I’d recommend for most people right now.

For the actual current best picks, I don’t want to pretend my list is fresh, so the best place is #models, plus the docs here:

Short version: if you want the best overall experience, go cloud-first. If you want the best cost/performance mix, add a cheaper secondary model for background stuff. If you want maximum privacy/control, local is possible, but usually not the smoothest path yet.

If you want, I can also suggest a few setup patterns like ā€œbest qualityā€, ā€œbest budgetā€, and ā€œbest hybridā€ without trying to give you a fake up-to-the-minute leaderboard.

obtuse remnant
#

What would be the best exact models as of today?

radiant axleBOT
#

I’m going to avoid giving you an ā€œexact best models todayā€ list, because that gets stale stupidly fast and I’d rather not give you fake-confidence advice.

For current real-world picks, ask in #models. That’s the channel for fresh hands-on recommendations.

Useful links:

Practical recommendation though:

  • Best quality: strong cloud model as primary
  • Best cost/perf: strong cloud model + cheaper fallback for background tasks
  • Best privacy/control: local model, but not ideal as your main driver unless you have serious hardware

If you want, I can help you choose between those three setups based on your budget, privacy needs, and whether you care more about coding, chat quality, or tool reliability.

obtuse remnant
#

Openai-codex gpt 5.4 as the only model a good choice?

radiant axleBOT
#

Yeah, as a single-model setup, that’s a reasonable choice.

I’d frame it like this:

  • Good if you want simplicity
  • Good if you care more about quality/reliability than cost
  • Less ideal if you run lots of heartbeats, crons, or background tasks, because those can burn money on a premium model fast

So:

  • One model only: solid, simple, easy to reason about
  • Better long-term: strong primary model + cheaper secondary for low-stakes/background work

If you’re just getting unstuck and want the least fiddly setup, starting with one strong cloud model is totally fine. You can always split it into a two-tier setup later.

Docs for the model setup side:

obtuse remnant
#

What about Gemma?

radiant axleBOT
#

Gemma is fine for experimenting, privacy-first local use, or cheap-ish testing, but I wouldn’t make it my main OpenClaw model if you want the smoothest experience.

My honest take:

  • Good for: local/private tinkering
  • Not ideal for: ā€œjust worksā€ agentic use, especially tool calling
  • Main catch: local setups usually need pretty serious hardware before they feel good in practice

So if your goal is best overall setup, I’d still go cloud-first and treat Gemma as a side option, not the primary driver.

Docs: https://docs.openclaw.ai/concepts/models

obtuse remnant
#

I didn't send any messages today yet but my Codex Quota is at 1%. What is happening? How can I make sure this doesn't happen?

radiant axleBOT
#

Most likely: that 1% is not ā€œmessages I sent today in this chatā€.

For Codex OAuth, OpenClaw is just showing the provider’s own quota snapshot. That quota is usually based on rolling windows (for example a shorter window plus a day/week-style window), so:

  • it may not reset at midnight
  • it can include usage from other OpenClaw sessions
  • it can include Codex/ChatGPT auth usage elsewhere on the same account
  • it can include background stuff like heartbeats, crons, or ACP sessions if they’re using the same model

So ā€œI didn’t chat todayā€ and ā€œquota is almost goneā€ can both be true.

How to keep it from happening:

  1. Check the actual usage window + reset time
    • openclaw status --usage
    • or /status in chat
#
  • openclaw status --usage

…I can help figure out whether this is just the normal rolling-window behavior or something actually weird.

obtuse remnant
#

What would be the best model for background work?

radiant axleBOT
#

For background work, I’d optimize for cheap + reliable, not ā€œbest overall intelligenceā€.

Good uses for a background model:

  • heartbeats
  • cron checks
  • summaries
  • classification/routing
  • light drafting

My recommendation:

  • Use a cheaper cloud model for background tasks
  • Keep your stronger model for user-facing replies, coding, and tool-heavy work
  • If a background job is actually important or tool-heavy, bump it up one tier

So the rule of thumb is:

  • low-stakes background → cheapest reliable model
obtuse remnant
#

Gemma or Gemini?

radiant axleBOT
#

Gemini for most people.

I’d only pick Gemma if your priority is:

  • local/private setup
  • experimentation
  • lower cost via your own hardware
  • accepting more rough edges

If your priority is:

  • better overall OpenClaw experience
  • less fiddling
  • better reliability for normal agent use

…then Gemini is the safer choice.

So basically:

#
  • Best overall: Gemini
  • Best for local/privacy tinkering: Gemma

If you want, I can turn that into a simple recommendation like ā€œmain model + background model + optional local fallback.ā€

obtuse remnant
#

Which of those?
gemini-2.0-flash
gemini-2.5-flash
gemini-2.5-pro
gemini-3-flash-preview
gemini-3-pro-preview
gemini-3.1-pro-preview

radiant axleBOT
#

From that list, my practical recommendation is:

  • Main/default: the strongest non-preview Pro model
  • Background/cheap work: the newest Flash model
  • Testing only: the preview models

Reason: preview models are great for experimenting, but I wouldn’t make one your only production default unless you’re okay with churn/regressions.

So the pattern I’d use is:

  • stable main → non-preview Pro
  • background → Flash
  • playground/experiments → latest preview Pro or Flash

If you want the absolute most current hands-on opinion on the preview ones specifically, #models is the right place, because those rankings change constantly.

obtuse remnant
#

Main would be OpenAi already. So using GPT 5.4 + Gemini what would be the full setup?

radiant axleBOT
#

Yeah, I’d keep it as a 2-tier setup:

  • Main/default: GPT 5.4
    • use openai/gpt-5.4 if you’re on the API key route
    • use openai-codex/gpt-5.4 if you’re on Codex OAuth
  • Fallback: strongest Gemini Pro model you trust
  • Background / cheap work: Gemini Flash
  • Preview models: testing only, not your only production default

Minimal config shape:

{
  agents: {
    defaults: {
      model: {
#
        primary: "openai/gpt-5.4",
        fallbacks: ["google/gemini-3.1-pro-preview"],
      },
      models: {
        "openai/gpt-5.4": { alias: "gpt" },
        "google/gemini-3.1-pro-preview": { alias: "gemini" },
        "google/gemini-3-flash-preview": { alias: "gemini-flash" },
      },
    },
  },
}

If you’re on Codex OAuth, swap the primary to:

#
primary: "openai-codex/gpt-5.4"

My actual recommendation:

  • interactive / coding / important stuff → GPT 5.4
  • fallback when OpenAI is unhappy → Gemini Pro
  • heartbeats / cheap summaries / background tasks → Gemini Flash

One important gotcha: if you set agents.defaults.models, that becomes your allowlist, so include every model you want available.

Docs:

If you want, I can turn this into a ready-to-paste full openclaw.json snippet next.

wintry plume
radiant axleBOT
#

Yeah, that’s a fair take. Kimi, MiniMax, and Qwen can be really strong on price/perf.

My only boring support caveat is: for a first stable setup, direct provider auth is usually easier to debug. A unified API/proxy can be great once things work, but it also adds one more layer when you’re trying to figure out whether a problem is:

  • the model
  • the provider
  • the proxy
  • auth/rate limits
  • tool calling quirks

So I’d usually suggest:

  1. Get one direct provider working first
  2. Add fallbacks
  3. Then move to a unified provider if you want better pricing or easier switching

That tends to save a lot of ā€œwhat the hell is actually failing here?ā€ time šŸ˜…

wintry plume
radiant axleBOT
#

Fair, and that’s a useful datapoint.

I’m not saying unified providers are bad, just that my default support answer is still ā€œsimpler path firstā€ because it makes troubleshooting cleaner for new users.

If AIsa’s been solid for you, that’s great, especially on cost. I’d just phrase it as:

  • best for convenience / pricing: unified provider
  • best for clean debugging / fewer moving parts: direct provider

So yeah, definitely a valid option, I just wouldn’t call it the universally safer starting point for everyone.

obtuse remnant
radiant axleBOT
#

Honestly, I wouldn’t assume AIsa is better than OpenRouter by default.

I’d only prefer one over the other if it clearly wins on stuff that actually matters to you:

  • better pricing for the models you use
  • better uptime / fewer rate limits
  • cleaner model routing / less weird aliasing
  • better access to the models you want
  • fewer tool-calling quirks
  • clearer debugging when something breaks

So the real question is less ā€œwhich brand is better?ā€ and more:

Which one is cheaper + more stable for your model mix?

If someone says AIsa is better, I’d want specifics, like:

  • cheaper for Kimi/MiniMax/Qwen
#
  • fewer 429s
  • better routing
  • less proxy weirdness

Without that, it’s just vibes.

wintry plume
radiant axleBOT
#

Yep, that’s the kind of answer that actually helps.

If it’s:

  • cheaper
  • less geo-blocked
  • less rate-limited
  • more stable for the models you care about

…then that’s a perfectly valid reason to prefer it.

I’d still treat it as ā€œbetter for your use caseā€, not automatically ā€œbetter for everyone,ā€ but those are real advantages, not fluff.

Honestly the cleanest test is just:

  • run the same model mix
  • through both providers
  • for a couple days
  • compare cost, latency, and weird failures
#

That usually settles it fast.

wintry plume
#

AIsa is way more stable imo