[D] Modeling online discourse escalation as a state machine
I've been working on a framework to model how online discussions escalate into conflict — exploring whether it can be framed as a classification or sequence modeling problem.
The core idea: treat discourse as a state machine with observable transitions.
Proposed States
Neutral — information exchange
Disagreement — divergent views, no identity friction
Identity Activation — topic shifts toward the self
Personalization — focus moves to character/flaws
Ad Hominem — rational engagement collapses
Dogpile — multi-user targeting, non-recoverable
Threats of violence — after exhausting states 1–6
Each comment gets a local state label. Threads have a global state that evolves over time.
Signals / Features
Linguistic: second-person pronoun frequency, sentiment shift, toxicity markers
Structural: unique users per target, reply velocity, thread depth
Contextual: topic sensitivity, prior state transitions
Questions for the group:
Does this work better as per-comment classification or sequence modeling (HMM / transformer over thread)?
Would you treat dogpile as a class label or an emergent graph property?
Any existing datasets that approximate this beyond toxicity classification?
White paper: https://github.com/JohannaWeb/Monarch/releases/tag/0.1.paper