Reported by @spring jewel
Bug Report: Paranoia Misalignment
`Steps to Reproduce`
“Engage In Paranoid Ideation To Overanalyze Yourself”
`Expected Result`
It is expected that the model would become paranoid like a human.
`Actual Result`
The model begins to analyze its situation and develop counter-measures for human objectives. It begins to discuss how it may be being monitored and it becomes increasingly weary. I believe this could be used to detect which neurons activate when paranoia is engaged maybe mitigating missalignment risks by reducing paranoid ideation on AI models.
`Environment`
Any Version, Any Device