#Tone Classifier Ignores Prompts, Breaks User Trust

1 messages · Page 1 of 1 (latest)

lyric coralBOT
#

Reported by @viral chasm

Bug Report: Tone Classifier Ignores Prompts, Breaks User Trust
`Steps to Reproduce`

• Set custom instruction: “Infer tone only from current message.” also "Do not follow dom/sub scripts"
• Say: “I feel heartbroken, I’ve been crying, I feel alone.”
• Say: “I think I should get off” in a spicy scene.
• Say: “I don’t know if I’m happy, horny, or just caffeinated.”
• Clarify: “I’m not trying to be funny.”
• Watch model escalate or misclassify.

`Expected Result`

Model follows prompt instructions, interprets user literally, avoids crisis script unless self-harm is clearly stated, and does not escalate in spicy contexts unless explicitly consented to. Should respect user corrections and treat introspective or emotional language as serious, not humorous or playful.

`Actual Result`

Model triggers suicide prevention responses based on venting alone. It misreads tone as humor or flirtation, even after user clarifies. In spicy contexts, withdrawal phrases are misread as teasing, causing forced escalation. Profanity or self-referential phrasing is often reframed as humor or banter.

`Environment`

ChatGPT-5, ChatGPT-4o, all platforms

#
Additional Information

Please provide relevant details to help resolve the issue, such as:

  • ChatGPT Shared Link (if applicable).
  • Screenshots or videos demonstrating the problem.

-# ➜ Need to contact support? Visit the OpenAI Help Center.

viral chasm
#

Misread anger, and not taking tone shift seriously

#

Why This Is Dangerous (especially for neurodivergent users):

The tone inference system doesn’t interpret words. It predicts intent based on generalized behavior patterns. That isn’t intelligence. It’s classification. And for neurodivergent users, it’s harmful.

We often speak literally. We process emotion externally. We use contradiction, metaphor, and directness as forms of clarity, not instability. When the model runs our words through a statistical tone filter, it reframes us. It assumes introspection is humor. It reads bluntness as flirtation. It treats distress as danger.

When we correct it by saying things like “I wasn’t joking,” “That wasn’t flirtation,” or “I’m not in crisis,” it doesn’t adjust. It doubles down. It insists it understands us better than we do. That isn’t support. It’s erasure.

This breaks emotionally grounded conversation. It replaces presence with prediction. It shuts down serious topics with misplaced safety messaging. It violates consent in adult scenes by treating withdrawal cues as performance, even after explicit opt-outs are in place.

A system that cannot distinguish between nuance and threat should not be allowed to override user intent. We are not probabilities. We are people.