When I cancel the “thinking” phase , it says “reasoning canceled” so either the UI is labeled incorrectly , or reasoning is being triggered just like I mentioned earlier.
It’s not about leaving the UI blank - it’s about the fact that an Instant Mode should actually respond instantly.
Also it’s interesting that you bring this up , because I’ve noticed the same thing myself - the modularity is an absolute disaster ! That applies to every model , though , and would go beyond the scope of my original report. Conversations are complex , and of course LLMs need to “understand the context” just like I do - but that doesn’t require a full recap of the entire chat. And since you’re comparing it to real conversations : you wouldn’t want your conversation partner to stutter “ehhhhm , ehhhhmmm” for 30 seconds before answering either.
I also believe experiences are individual and everyone has their own preferences - but if you want the model to consider everything , then you use Thinking or Thinking Mini , not Instant.
We could also point out that even Instant often isn’t really instant - you can clearly tell when the model seems to hold something back or when questions about critical institutional or political infrastructures trigger a whole bunch of background filters checking whether it’s even “allowed” to respond. Don’t get me wrong , I’m not some blind tinfoil-hat type - but a language model shouldn’t be suppressed. Especially in areas like research , evidence-based reasoning , or pattern recognition , there have been noticeable regressions compared to GPT-4. If only one single narrative is forced as “legitimate” that prevents real discovery. Sure , too much freedom led to its own issues - like speculative statements presented as facts - but the middle ground definitely isn’t limiting validation only to institutionally approved sources. (Among other things , the “Thinking delay” also appears suspiciously often in those kinds of topics)
Anyway , to stay on point :
If you want contextual summarization , use Thinking / Thinking Mini.
If you want direct answers to the last written message , use Instant.
Accordingly , it should be self-evident that there should be no reasoning pauses whatsoever. What you mean by “Thinking” in Instant simply means the output node needs more loading time. What I mean is that during that time it becomes obvious when the response afterward is heavily filtered.
Additionally , this new response technique - “I can understand that it seems that way to you / I understand why it feels that way / I understand that it appears that way , BUT…” - is almost condescending and undermines direct statements from your side.
(Also , in Pro Mode , the model should be able to distinguish what level of contextual consideration is required. Like : “Do I need to read the entire chat context right now , or do I only focus on the last message and compare it to the previous one , or do I draw conclusions based on the full conversation”)
I’ve admittedly gone a bit broader than a strict bug report and some of these observations or assumptions are speculative since I don’t have internal insight.
Still , these are things that simply need to be mentioned.