#Looking for Help Optimizing My Model Setup

1 messages · Page 1 of 1 (latest)

quasi gate
#

I’ve been tracking my usage over the past week and noticed things have escalated quickly. On Jan 26, I was running a mix of Opus 4, Opus 4.5, and Sonnet 4 — my cache writes were around 284K tokens, cache reads around 2.2M, and output around 20K tokens across maybe 8-10 calls per day. Fast forward to Feb 3 and I’m now exclusively on Opus 4.5 with cache writes at 4.5M tokens, cache reads at 83M, and output at 149K across roughly 18-20 calls.
I thought I’d use Opus 4.5 as the orchestrator thinking it wouldn’t rack up many turns — turns out that’s not the case. Even with the cache read discount, running 83M tokens through Opus 4.5 adds up fast and I’m burning through credits.
For those of you running multi-model setups — what’s your orchestration strategy? Are you using a cheaper model as the router and only escalating to Opus for complex tasks? What model are you using for the orchestrator role? Any suggestions appreciated