Hey everyone — quick pattern I’ve been debugging in Claude/OpenAI-style agent workflows:
If your output is stable but cost keeps rising, it’s often token leakage, not just “higher usage”.
Top 3 leakage paths I keep seeing:
Duplicate calls (same task triggered multiple times)
Context bloat (too much history passed every turn)
Retry storms (aggressive retry policy during upstream instability)
Minimal fields that helped me isolate root cause fast:
timestamp
task_id / conversation_id
input_tokens / output_tokens
error_type / status_code
retry_count
Fix order that worked best: stop loss first → identify biggest leak source → codify as rules
Curious what others see most in production right now:
duplicate execution?
context drift?
retry spikes?