Reported by @proud holly
Have a long conversation with GPT-4O, around 12K-24K token mark. You could easily reach that with math and a lot of latex.
Others like it:
- https://www.reddit.com/r/ChatGPT/comments/1giz82g (Cut-off)
- https://learn.microsoft.com/en-us/answers/questions/1695772/gpt-4o-does-return-empty-responses (Empty responses)
Severity: This is misleading both your Plus and Enterprise customers. No 32K/128K.
Temporary Workaround: Ignore 200002 logprob and structure the chat corpus with another stop token.
Expected Result: To be like GPT-4 Legacy, which never cuts off. This is a model issue.
Theory: This might be due to truncated training samples if you truncated the samples to perfectly N tokens, causing unfinished last responses since they did not fit within that range, resulting in the model assuming conversation has to end abruptly at some random point.
Permanent Fix: Model training truncation should've included a full last message, with the rest of "free tokens" simply set to EOS (Padding).
"But we can simplify that to just"—(cutoff)
"or in a more straightforward approach"—(cutoff)
"[ y = v"—(cutoff)
""—???
SSE:
data: {"p": "", "o": "patch", "v": [{"p": "/message/content/parts/0", "o": "append", "v": ", we get:\n\[\ny = v"}, {"p": "/message/status", "o": "replace", "v": "finished_successfully"}, {"p": "/message/end_turn", "o": "replace", "v": true}, {"p": "/message/metadata", "o": "append", "v": {"finish_details": {"type": "stop", "stop_tokens": [200002]}, "is_complete": true}}]}
Everywhere: Android/iOS, Browsers, Desktop. It's a model issue at your backend, but could be worked around with some engineering temporarily.