#Safety System for Open-source LLMs

3 messages · Page 1 of 1 (latest)

wintry cradle
#

Hi all, I've deployed an LLM bot (with OpenRouter API) in my Discord server, but some users asked very sensitive (i.e., edgy) questions. The LLM responded to these in detail. How can I prevent users from sending sensitive queries? Is this on the roadmap or is this already available?

faint wagon
#

You could do some prompt engineering to try and self-moderate the model by telling the model to refuse to interact with sensitive queries.

You could also send each prompt to a cheaper model to act as a moderation layer - Anthropic demonstrates the concept here using their Haiku model: https://docs.anthropic.com/claude/docs/content-moderation#using-claude-for-content-moderation

Claude

Screening user input before it reaches your main language model allows you to prevent the processing or output of harmful, offensive, or irrelevant content, saving both computational resources and potential damage to your brand reputation.In this guide, we'll explore how to use Claude to efficiently...

hearty flare
#

cc @vocal vigil