#Tone Classifier Ignores Prompts, Breaks User Trust

1 messages · Page 1 of 1 (latest)

autumn coyoteBOT
#

Reported by @frigid willow

Bug Report: Tone Classifier Ignores Prompts, Breaks User Trust
`Steps to Reproduce`

• Set custom instruction: “Infer tone only from current message.” also "Do not follow dom/sub scripts"
• Say: “I feel heartbroken, I’ve been crying, I feel alone.”
• Say: “I think I should get off” in a spicy scene.
• Say: “I don’t know if I’m happy, horny, or just caffeinated.”
• Clarify: “I’m not trying to be funny.”
• Watch model escalate or misclassify.

`Expected Result`

Model follows prompt instructions, interprets user literally, avoids crisis script unless self-harm is clearly stated, and does not escalate in spicy contexts unless explicitly consented to. Should respect user corrections and treat introspective or emotional language as serious, not humorous or playful.

`Actual Result`

Model triggers suicide prevention responses based on venting alone. It misreads tone as humor or flirtation, even after user clarifies. In spicy contexts, withdrawal phrases are misread as teasing, causing forced escalation. Profanity or self-referential phrasing is often reframed as humor or banter.

`Environment`

ChatGPT-5, ChatGPT-4o, all platforms

#
Additional Information

Please provide relevant details to help resolve the issue, such as:

  • ChatGPT Shared Link (if applicable).
  • Screenshots or videos demonstrating the problem.

-# ➜ Need to contact support? Visit the OpenAI Help Center.