A consequence on content restriction on hate speech also restricts the identification or reporting of hate speech. For example, if a user inputs a few news articles and asks ChatGPT to indentify hateful content, it triggers the content policy. Obvious by design and feels like a hard thing to fix, but I wanted to draw the developers responsible for the content management to be aware.
#Unintended consequence of content policy
1 messages · Page 1 of 1 (latest)
There is the moderation api exactly for that use case 🙂