#I have reached out to customer support
1 messages · Page 1 of 1 (latest)
Cascade having so many models is not prescriptive to specific models to tune it away from its specific tendencies/issue. We Windsurf users get very close to the "raw API" experience in many ways, just well adapted for the Cascade harness with its exceptional fast context and other glue around the prompting experience.
This is great for teams who specialize and learn the tendencies of specific models, and set their own org/team/individual global rules and workflows to steer those models towards success in their environments. Cascade is customizable in a way that rivals the open-source agent harnesses and code editors.
However, it can be challenging for an individual onboarding with the base Cascade harness. It reminds me a lot like working with Linux, you have so much power and extensibility, but that also means you likely have months ahead of learning, observing, using, and customizing ahead.
As frustrating as that reply was, to try creating rules and workflows, it is the path forward.
For the past 8 months, I've been keeping a running documentation/project documenting every good workflow, and tweaking every rule to curb the annoying aspects of these models (particularly the Anthropic models series as I have the most experience using their tooling and models).
And so I have a global rules file that is portable to any coding agent harness, and steers Anthropic away from bad habits like writing new documentation, over-engineering, or specifically against creating stub implementations. Language specific processes/tools, local context for technologies so that it doesn't try to code from training data, etc.
Every single time, I observe bad behavior, I complete my work, and branch off that discussion/chat/session and ask the model "why did you do X, what and where in your memories/instructions/system message/global user memories (language differs per harness) influenced your decision?
Over time you'll get a process that you can trust.
I agree with your assessment and comparison of the out-of-the-box feel for GPT-5.x and Anthropic's models. You are on the right path with observing their strengths and weaknesses. I'd say specialize in the one that is the most predictable and consistent to your style and start reading the prompting guides from that model vendor, OpenAI and Anthropic release fantastic ones per model with plenty of very practical guidance, patterns and even sometimes random prompt magic, lol.
Most of the annoying behaviors are spelled out in their guides, and they offer suggestions for how to steer it away from that bad behavior.
Maybe stick with GPT-5.x models for now, as the combination of OpenAI's models being very steerable, and Cascade's not overbearing instructions are a great combination! I've noticed those models perform exceptionally well inside Windsurf.
man do i feel every word of this lol the frustrations and the "oh well, fight it or learn to work around it"
the key issue is that we are at the mercy of the model providers, so we can have all of that in place and if the base model changes even slightly then all of that effort has to be restarted.
I do that exact thing, have it back engineer the logic so i can put constraints in place that steer away from the bad behavior but the moment the model provider changes the training that all goes out the window.
the most reliable thing i have found in any of these systems is honestly Open ai assistants playground and they are doing away with it lol
my real solution is agent chaining, since these providers are pushing for us to use the right model they provide bc its cheaper for them then my thought process was ill just use that to my advantage and have a single orchestration agent ( claude ) that can call on other more task oriented models like gpt 5 for completing. this sticks with claudes "fastest answer wins" training while leveraging gpts ability to stay on task.
the implementation portion im working on how to string the two together.
Indeed, that orchestration pattern combined with the best of n approaches/perspectives yields some fantastic results. At the moment Windsurf is not an orchestration tool just yet, it is the ultimate pair programmer with having access to the team's software architect (fast context, deepwiki, codemaps, etc.) who knows where are all edge cases are in your code and can answer random questions in seconds, that take other agent harnesses/models minutes
But I think with the worktree feature and the new parallel Cascde session UI change, they are adapting the product to be able to support that orchestration workflow soon.
Sticking with one model vendor will help with that a lot I feel. OpenAI's 2nd half of 2025 was quite frenetic, and keeping up with their model's purposes are very challenging even for those well-versed in their products.
There's no reason you have to jump to the latest model all the time if you are getting reliable results with older ones! In large orgs, I've noticed that many engineers will be on "last-gen" SOTA models. Because that's the one they gotten stability with.
And then you can read the guides for migrating from one model to another. That's often a section in either OpenAI or Anthropic's docs
I actually tried using CC for a while. It worked great at first, but I eventually ran into the same issues, and the 4-hour usage limits just didn’t make sense. At that point it was cheaper to run Windsurf and pay for extra credits.
I agree though, this applies to pretty much everything. Ignore the flashy “look at me” stuff and stick with what actually works. That’s why I’ve started tuning out most new releases. If you chase every update, you never stay with anything long enough to really get good at it or build a solid, real use case.
That doesn’t mean you shouldn’t keep an eye out, just in case. Every now and then, one tool actually does change a workflow.
I’ll usually test new models when they come out, but I stick with what’s proven. I don’t use Claude Opus Thinking because it’s way harder to control than Opus, but Opus has gotten bad enough that I’ve started going back to Sonnet. I’ll probably stay there until they get the assumptions and non-adherence under control.
And yeah, I agree, but it also shouldn’t be this hard. With all the features they’re shipping, you’re telling me I can’t have two windows open on the same project with different agents that can talk to each other? That’s not exactly a crazy ask.
iterally opus right now.... lol
I'm pretty sure they are building to that workflow! The Github Copilot Agent HQ, Cursor 2.0 and Antigravity wholesale shift towards orchestration is hard to ignore. And Cognition has its answer in its back pocket Devin. We just have to wait and see how they decide to bring the Devin agent to Windsurf
Non-reasoning models have a terribly hard time following directions. I avoid them in anything other than very linear work that requires execution only, not judgement. Opus 4.5 Thinking is much better about following instructions, even if GPT-5.2 as you say is even more compliant
I have exactly one non-reasoning model enabled in Cascade right now: SWE-1.5 Free. Not a single other one
well i just realized im an idiot.. lol
so apparently the rules files have to be in txt? guess i missed that memo?
windsurf docs literally say markdown, but i was talking to the customer support bot and noticed it had the file types it could read and just figured id try txt in the rules section and all of sudden now the ui for trigger settings show and the llm is actually following the patterns outline. guess it never saw the file before?
The documentation of Windsurf has lagged a bit behind during this busy model release period and new feature period.
I noticed that Windsurf will often have a gitignore issue on its rules that I helped someone through yesterday in this thread: https://discord.com/channels/1027685395649015980/1454954121877524529
My rules files are in markdown but the gitignore issue was a reason why I noticed at some point like you that my rules weren't being followed, as did this other user in the thread I linked