#recommendation engine with rag

1 messages · Page 1 of 1 (latest)

ionic wind
#

Which is a better way to chunk documents for a recommendation engine - uploading CSV files with product information and keywords or text paragraphs for each product?
In response I only want the product which is most suitable for the user basis the industry type identified from the prompt

tawdry basinBOT
#

Hi @ionic wind. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Based on the information provided in the extracts, both methods can be used to chunk documents for a recommendation engine. However, the choice between uploading CSV files with product information and keywords or text paragraphs for each product depends on the specific requirements of your use case.

If you have a CSV file with structured data, such as product information and keywords, you can optimize the content by structuring it as a JSON file. This approach was successfully used in a case where a CSV file with 5,000 rows and up to 11 columns of data per row was uploaded. The data was structured as a JSON file, and an index was created as part of the JSON file, so each entry had an id. In the chatbot instruction, the system was asked to look at all the indexed elements before providing a response (source (https://community.openai.com/t/498177)).

On the other hand, if you have unstructured data, such as text paragraphs for each product, you can break the documentation into chunks and run embeddings on each chunk of documentation data. The returned vector is stored along with the data. When a user question input is received, embeddings are run to obtain a vector, and a similarity match is run between the question embeddings vector and all documentation vectors. The top results are put into the AI’s conversation history before the most recent question (source (https://community.openai.com/t/328330)).

#

For your specific requirement of returning the most suitable product for the user based on the industry type identified from the prompt, you might want to consider using embeddings. This approach allows you to perform a similarity search on chunks of your text and then attach the results to the beginning of your prompt and ask it to respond. This process might be slower, but it is currently the recommended way to learn new information (source (https://community.openai.com/t/554222)).

Remember, there is no one-size-fits-all answer, and you should experiment with what matches your use case (source (https://community.openai.com/t/563185)).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@ionic wind kapa.ai is still learning and improving, please let us know how it did by reacting below

ionic wind
#

How do I ensure that the model identifies the product that would be suitable basis the intent of the user from the prompt and then return only the product name and not entire paragraphs @tawdry basin

tawdry basinBOT
#

Let me check the knowledge base for you. One moment! :books:

#

Sorry @ionic wind, I seem to be having a bit of trouble. Don't worry though, I've pinged my support and they will try to fix me up! :technologist:

Perhaps try again in 15mins?

ionic wind
#

How do I ensure that the model identifies the product that would be suitable basis the intent of the user from the prompt and then return only the product name and not entire paragraphs @tawdry basin

tawdry basinBOT
#

Let me check the knowledge base for you. One moment! :books:

#

Based on the information provided in the extracts, you can use a combination of prompt engineering and intent classification to achieve this.

  1. Prompt Engineering: You can use prompt engineering to guide the model's output. For instance, you can ask the model to return only the product name. The more specific your instructions, the better the model can respond (source (https://cookbook.openai.com/articles/how_to_work_with_large_language_models#more-prompt-advice)).

  2. Intent Classification: You can use intent classification to identify the most relevant instructions for a user query. This involves instructing the model to emit special strings to indicate when the state of the conversation changes. This enables us to turn our system into a state machine where the state determines which instructions are injected. By keeping track of state, what instructions are relevant at that state, and also optionally what state transitions are allowed from that state, we can put guardrails around the user experience (source (https://platform.openai.com/docs/guides/prompt-engineering)).

Here's a simplified example of how you might structure your prompt:

prompt = """
User: I am looking for a product suitable for the manufacturing industry.
AI: Based on your requirement, I recommend the product {product_name}. Please note that this is just a recommendation and you should consider other factors before making a decision.
"""

In this example, {product_name} is a placeholder for the actual product name that the model identifies as suitable based on the user's intent.

#

Remember, the model can't read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see. The less the model has to guess at what you want, the more likely you’ll get it (source (https://platform.openai.com/docs/guides/prompt-engineering)).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: