Verification of data for respective prompt | Learn AI Together | Page 1

errant bison Jan 17, 2025, 9:23 AM

#

I am trying to verify if the knowledge data matches the prompt given by the user.
Backstory: -

I am using smollm2 from ollama as my main llm and langchain/crewai as my main farmwork
I have a custom-built tool that searches google and returns web url's based on the search query produced from the user's prompt.
These web url's and the content within will be the main data source for the llm to give an answer.

My main question is how do i verify if the data source matches the prompt given by the user.

My solutions (not that great): -

Created a tool using smollm2 to verify the context of the data source and the prompt: Very ambiguous results
Tried similarity scores using embedding models: varying scores, unable to set a threshold

Pls do feel free to share your ideas or thought processes.
Thanks in advance!

errant bison Jan 21, 2025, 5:37 AM

#

some help pls

gleaming rapids Jan 22, 2025, 10:17 AM

#

errant bison I am trying to verify if the knowledge data matches the prompt given by the user...

You can simply use the top search results from Google, as they are likely to be most relevant to search query.

errant bison Jan 23, 2025, 4:07 AM

#

gleaming rapids You can simply use the top search results from Google, as they are likely to be ...

I agree but I still need a tool for the verification.

Cuz, let’s say the user asks about “Napoleon” and the google search returns “Napoleon Bonaparte”(correct) and “Napoleon III”(incorrect).

How would I filter it out then?

gleaming rapids Jan 23, 2025, 6:16 AM

#

errant bison I agree but I still need a tool for the verification. Cuz, let’s say the user a...

You can simply instruct the LLM to disregard irrelevant information in the system prompt.

errant bison Jan 23, 2025, 7:08 AM

#

Tried it (mentioned above), got very poor results.

This is the template I have used:-
verifier_template = """
You are an expert at verifying data based on the given input sentence.

Sentence: {input_sentence}

First, extract the context from the sentence. Next, analyze the provided data.

Data: {data}

Verify if the data aligns with the concept extracted from the sentence. Be precise and accurate in your evaluation.

Respond in either "Yes" or "No." Explain your thought process.

"""

gleaming rapids Jan 23, 2025, 2:05 PM

#

errant bison Tried it (mentioned above), got very poor results. This is the template I have ...

You don't need to call a LLM separately to filter out irrelevant information. In a RAG based approach, you can use a single system prompt. For example:

Using the relevant information from knowledge base, respond to the user's query. 

Knowledge Base:
"""
{knowledge_base}
"""

User's Query:
"""
{user_query}
"""

However, if you still wish to use a separate LLM call to filter out irrelevant information, we can enhance the prompt by employing a relevance scale (1 to 10) instead of a binary output (Yes/No). This allows the LLM to more effectively articulate the relevance between the data and the sentence. We can subsequently establish a threshold based on preliminary tests. For instance:

You are an expert at verifying data based on the given input sentence.

Sentence: {input_sentence}

First, extract the context from the sentence. Next, analyze the provided data.

Data: {data}

Verify if the data aligns with the concept extracted from the sentence. Be precise and accurate in your evaluation.

Output a relevance score from 1 to 10. Where 10 signifies the highest relevance and 1 denotes the lowest relevance. Explain your reasoning.

#Verification of data for respective prompt