I am trying to build a chatbot that can chat with any github repo using openai api.
I am using langchain to split my documents and then creating an embedding from those texts, and it is working fine, but how can I get the source along with the response?
Here is my code which splits the docs and return the texts as a string
const splitCode = async (files: Github[]) => {
const splitter = RecursiveCharacterTextSplitter.fromLanguage("js", {
chunkSize: 1000,
chunkOverlap: 100,
});
const content = files.map((file) => file.content!);
const chunks = await splitter.createDocuments(content);
const texts = chunks.map((chunk) => chunk.pageContent);
return texts;
};
And here is my function to create embedding from those texts
export const fetchEmbedding = async (text: string[]) => {
const response = await openai.embeddings.create({
model: "text-embedding-ada-002",
input: text,
});
if (!response) {
throw new ConvexError("Unable to create embeddings from Open AI.");
}
return response;
};
And then I am using semantic search to pass the content to openai and generate a response to the users query from the embeddings.