#BUG IN THE INSTANT MODE

1 messages · Page 1 of 1 (latest)

shy cobaltBOT
#

Reported by @valid lantern

Bug Report: BUG IN THE INSTANT MODE
`Steps to Reproduce`

Reasoning in Instant Mode shouldn’t even be active, yet it still kicks in with longer processing/thinking — I’ve noticed this for the thousandth time now in the Apple app version. It still needs to be tested on PC, but there I only use Pro and Thinking, so reasoning is normal.

In Instant, though, it’s an absolute no-go!

`Expected Result`

Instant Mode should never trigger extended reasoning or long “thinking” pauses.
Responses should appear immediately with minimal latency and without visible reasoning delay.

`Actual Result`

Instant Mode intermittently activates reasoning or extended thinking, resulting in long pauses before responses.
The message delay and “thinking” animation appear even though Instant Mode should be purely fast-response.
This happens consistently in the Apple app version, even when network and device performance are stable.

`Environment`

Apple / ChatGPT App (latest App Store version)

#
Additional Information

Please provide relevant details to help resolve the issue, such as:

  • ChatGPT Shared Link (if applicable).
  • Screenshots or videos demonstrating the problem.

-# ➜ Need to contact support? Visit the OpenAI Help Center.

tepid star
#

Are you asking the model for current information? A search? Any kind of lookup for data that's outside of it's scope of training data? If so, the UI just rolls everything into "Thinking" and doesn't differentiate about what it's actually doing.
Any help?

valid lantern
# tepid star Are you asking the model for current information? A search? Any kind of lookup f...

Well , that’s not entirely correct 😅

There’s a clear distinction between Online Search , Extended Reasoning and Instant Response.

And no this has nothing to do with online lookups. It’s directly connected to longer conversations. Once the chat becomes a bit longer and Reasoning gets triggered for the first time (unnecessarily) — even for simple replies — it keeps activating “Extended Thinking” afterward , even for the shortest messages.

I can easily reproduce this and provide a video if needed.

tepid star
#

OK, many people confuse "Thinking" with only the reasoning process. Staff has confirmed that the "Thinking" text displays in the user interface whenever the assistant happens to be doing something that will take more time. Even if the UI directs that no "reasoning" is to be performed, other processes might require more "thinking" time.

Since you've mentioned longer context, yes, the longer the context the more the LLM needs to process in terms of raw data/compute. That falls under the category of the text "thinking", as it needs to process what the current (short) prompt means within the context of the longer discussion.

But sure, if you see some actual "thinking" going on in the CoT display after a long discussion and it looks to you like it's doing more reasoning than it should, call it a bug and see how it works out.

In any human discussion, like this one, where we need to consider more and more as more points are exchanged, we really do simply need to think through more. That takes some number of milliseconds more, the longer the discussion continues. The GPT model is no different. If we ask it to consider any statement=prompt, and it needs to consider how that fits into the longer discussion, it will inevitably take a bit longer. In that scenario, do you prefer for the UI to just sit there? Probably not. The "Thinking" text is more like a little spinning progress icon just to let the user know something is happening.

Again, I agree, that there is a difference between a progress indicator and actual reasoning mode. If you're sure reasoning mode has been triggered, yes, it might be a bug. Personally though, even if reasoning mode is off, I appreciate, and insist, that no matter how much I try to limit the assistant to quick responses, that it should properly consider every prompt to yield a quality response. I prefer a real answer, quality over the random directive that limits time. YMMV

valid lantern
# tepid star OK, many people confuse "Thinking" with only the reasoning process. Staff has co...

When I cancel the “thinking” phase , it says “reasoning canceled” so either the UI is labeled incorrectly , or reasoning is being triggered just like I mentioned earlier.

It’s not about leaving the UI blank - it’s about the fact that an Instant Mode should actually respond instantly.

Also it’s interesting that you bring this up , because I’ve noticed the same thing myself - the modularity is an absolute disaster ! That applies to every model , though , and would go beyond the scope of my original report. Conversations are complex , and of course LLMs need to “understand the context” just like I do - but that doesn’t require a full recap of the entire chat. And since you’re comparing it to real conversations : you wouldn’t want your conversation partner to stutter “ehhhhm , ehhhhmmm” for 30 seconds before answering either.

I also believe experiences are individual and everyone has their own preferences - but if you want the model to consider everything , then you use Thinking or Thinking Mini , not Instant.

We could also point out that even Instant often isn’t really instant - you can clearly tell when the model seems to hold something back or when questions about critical institutional or political infrastructures trigger a whole bunch of background filters checking whether it’s even “allowed” to respond. Don’t get me wrong , I’m not some blind tinfoil-hat type - but a language model shouldn’t be suppressed. Especially in areas like research , evidence-based reasoning , or pattern recognition , there have been noticeable regressions compared to GPT-4. If only one single narrative is forced as “legitimate” that prevents real discovery. Sure , too much freedom led to its own issues - like speculative statements presented as facts - but the middle ground definitely isn’t limiting validation only to institutionally approved sources. (Among other things , the “Thinking delay” also appears suspiciously often in those kinds of topics)

Anyway , to stay on point :
If you want contextual summarization , use Thinking / Thinking Mini.
If you want direct answers to the last written message , use Instant.
Accordingly , it should be self-evident that there should be no reasoning pauses whatsoever. What you mean by “Thinking” in Instant simply means the output node needs more loading time. What I mean is that during that time it becomes obvious when the response afterward is heavily filtered.

Additionally , this new response technique - “I can understand that it seems that way to you / I understand why it feels that way / I understand that it appears that way , BUT…” - is almost condescending and undermines direct statements from your side.

(Also , in Pro Mode , the model should be able to distinguish what level of contextual consideration is required. Like : “Do I need to read the entire chat context right now , or do I only focus on the last message and compare it to the previous one , or do I draw conclusions based on the full conversation”)

I’ve admittedly gone a bit broader than a strict bug report and some of these observations or assumptions are speculative since I don’t have internal insight.
Still , these are things that simply need to be mentioned.

tepid star
#

Good luck

valid lantern
#

come on .. you could've at least said ''fair point , but thats tough to change internally'' would've been a far more respectable exit 😅