#Safety System for Open-source LLMs
3 messages · Page 1 of 1 (latest)
You could do some prompt engineering to try and self-moderate the model by telling the model to refuse to interact with sensitive queries.
You could also send each prompt to a cheaper model to act as a moderation layer - Anthropic demonstrates the concept here using their Haiku model: https://docs.anthropic.com/claude/docs/content-moderation#using-claude-for-content-moderation
Claude
Screening user input before it reaches your main language model allows you to prevent the processing or output of harmful, offensive, or irrelevant content, saving both computational resources and potential damage to your brand reputation.In this guide, we'll explore how to use Claude to efficiently...
cc @vocal vigil