#How to get source from openai when using it as a RAG?

1 messages · Page 1 of 1 (latest)

sturdy marlin
#

I am trying to build a chatbot that can chat with any github repo using openai api.
I am using langchain to split my documents and then creating an embedding from those texts, and it is working fine, but how can I get the source along with the response?

Here is my code which splits the docs and return the texts as a string

const splitCode = async (files: Github[]) => {
  const splitter = RecursiveCharacterTextSplitter.fromLanguage("js", {
    chunkSize: 1000,
    chunkOverlap: 100,
  });

  const content = files.map((file) => file.content!);
  const chunks = await splitter.createDocuments(content);

  const texts = chunks.map((chunk) => chunk.pageContent);

  return texts;
};

And here is my function to create embedding from those texts

export const fetchEmbedding = async (text: string[]) => {
  const response = await openai.embeddings.create({
    model: "text-embedding-ada-002",
    input: text,
  });
  if (!response) {
    throw new ConvexError("Unable to create embeddings from Open AI.");
  }
  return response;
};

And then I am using semantic search to pass the content to openai and generate a response to the users query from the embeddings.

next coralBOT
#

Hi @sturdy marlin. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Based on the information provided in the extracts, it seems you're already on the right track with splitting your documents, creating embeddings, and using semantic search. However, the extracts do not provide specific information on how to get the source along with the response when using OpenAI as a Retrieval-Augmented Generation (RAG) system.

In general, when you perform a semantic search with embeddings, you would typically keep track of the original documents (or their identifiers) that correspond to each embedding. Then, when you get a response from the semantic search, you can use the identifiers of the most relevant embeddings to retrieve the corresponding original documents.

However, the specifics of how to implement this would depend on the details of your application and the OpenAI API, which are not covered in the provided extracts. For more detailed guidance, you might want to refer to the OpenAI API documentation (https://platform.openai.com/docs/api-reference) or ask a question on the OpenAI Developer Forum (https://community.openai.com).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@sturdy marlin kapa.ai is still learning and improving, please let us know how it did by reacting below

sturdy marlin
#

@next coral
What would be the best way to get source along with the response?
I am using gpt-3.5-turbo-1106

next coralBOT
#

Let me check the knowledge base for you. One moment! :books:

#

Based on the information provided in the extracts, there isn't a direct way to get the source along with the response when using the gpt-3.5-turbo-1106 model. The OpenAI API does not inherently track or return the source of the information it uses to generate responses.

However, you can implement a workaround in your application to keep track of the source of the embeddings. Here's a general approach:

  1. When you create embeddings from your documents, store them in a data structure along with their corresponding source information. This could be as simple as a list of tuples, where each tuple contains an embedding and its source document.
const embeddingsWithSource = text.map((textItem, index) => {
  const embedding = createEmbedding(textItem);  // Your function to create embeddings
  const source = files[index];  // Assuming 'files' is your array of source documents
  return {embedding, source};
});
  1. When you perform a semantic search and get a response, find the most relevant embeddings in your data structure. The source information will be right there with them.
const response = performSemanticSearch(query, embeddingsWithSource.map(item => item.embedding));  // Your function to perform semantic search
const relevantSources = response.map(result => {
  const relevantEmbedding = result.embedding;  // Assuming 'result.embedding' is the relevant embedding
  const relevantSource = embeddingsWithSource.find(item => item.embedding === relevantEmbedding).source;
  return relevantSource;
});

Please note that the above code is a conceptual example and may need to be adjusted based on the specifics of your application and the OpenAI API.