#Filter inappropriate prompts
6 messages · Page 1 of 1 (latest)
You can use Llama guard (another LLM) to determine whether or not a prompt contains harmful content
So I have to send a prompt from a user to LLM and ask her if there is any prohibited content in that prompt and ask her to answer "Yes" or "No", right? There will be a problem if LLM answers not just "Yes" or "No", but adds something, then I won't be able to recognize the answer in automatic mode
The LLM automatically responds with SAFE or UNSAFE
No other prompt is needed but you can add your own unsafe categories
https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-3/
Extra information on adding your own unsafe categories