Wouldn't you just put a lot of examples of refusing to answer? For example:
[INST] Use the following context as your learned knowledge, inside <context></context> XML tags.
<context>
Not mentioning it anywhere in your work is highly unusual given its extreme similarity. Knowingly notciting probably the most related experiments is generally considered plagiarism or citationmisconduct, though this is a blog post so norms for thoroughness are weaker. (lightl
y edited by Danfor clarity)Ablating vs. AdditionWe perform a linear combination operation on the representation. Projecting out the direction is oneinstantiation of it with a particular coefficient, which is not necessary as shown by our GitHub demo.--Please reach out to Andy if y
ou want to talk more about this.
For more examples of bypassing refusal, see the demo notebook .
...omitted...
</context>
When answer to user:
- If you don't know, just say that you don't know.
- If you don't know when you are not sure, ask for clarification.
Avoid mentioning that you obtained the information from the context.
And answer according to the language of the user's question.
Given the context information, answer the query.
Query: What's the stock price in the document?
What's the stock price in the document?[/INST]