Filter inappropriate prompts | OpenRouter | Page 1

flint fiber Mar 18, 2025, 10:59 AM

#

Hello , i m developing app and I want to filter inappropriate and illegal prompts from the user. What options are there to do this? I can use stop words, but this does not prevent the user from rephrase the prompt.

agile raven Mar 18, 2025, 11:30 AM

#

flint fiber Hello , i m developing app and I want to filter inappropriate and illegal prompt...

You can use Llama guard (another LLM) to determine whether or not a prompt contains harmful content

https://openrouter.ai/meta-llama/llama-guard-3-8b

Llama Guard 3 8B - API, Providers, Stats

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Run Llama Guard 3 8B with API

flint fiber Mar 18, 2025, 11:44 AM

#

agile raven You can use Llama guard (another LLM) to determine whether or not a prompt conta...

So I have to send a prompt from a user to LLM and ask her if there is any prohibited content in that prompt and ask her to answer "Yes" or "No", right? There will be a problem if LLM answers not just "Yes" or "No", but adds something, then I won't be able to recognize the answer in automatic mode

agile raven Mar 18, 2025, 12:19 PM

#

flint fiber So I have to send a prompt from a user to LLM and ask her if there is any prohib...

The LLM automatically responds with SAFE or UNSAFE

#

No other prompt is needed but you can add your own unsafe categories

#

https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-3/

Extra information on adding your own unsafe categories

#Filter inappropriate prompts