#Train a chatgpt like bot for a specific book

1 messages · Page 1 of 1 (latest)

onyx path
#

I want to train the chatgpt like bot for a book, the book has 3-4 volumes so basically around 1500-2000 pages
and I want it to train to be able to answer questions from it.

It's basically like a book of law and it should be able to answer questions based on those laws.

so for this would OpenAI Api work? or what would be the best approach for this

cinder remnant
# onyx path I want to train the chatgpt like bot for a book, the book has 3-4 volumes so bas...

Custom GPTs can easily take a 2000 page pdf meaning no need for backend.

What you can do is split the book into chapters and give it a knowledge base as either pdf or JSON, since you have a max file limit of 20 this shouldn’t be an issues.

As far as instructions go just base them around the book file’s you’ve already given it as it’s local knowledge.

Just limit its messages to the book itself and it should run smoothly

#

This is assuming only you need to answer questions from it, if others as well you’ll need to design it as a MyGPT suitable for the marketplace

#

I’m now going to proceed to give my own answer a star

hoary panther
#

Watch on Youtube many videos about RAG. Or consider the "new idea" of CAG (requires modifying the LLM itself). Theory of RAG is that the script converts the book into chunks, then converts the chunks into multiword embeddings. Then, the script question is converted into similar multiword embeddings. Then a comparison is made in the database between your search embeddings and the book chunk embeddings. The script finds the chunk embeddings most similar to the question embeddings, then sends the chunk text of the "relevant" chunks to the LLM and includes the text in the prompt to Answer the Question. One of my side projects is creating a new RAG method that may be optimal for law book information retrieval. Is this needed just to get you to pass one "Open book Test"? Or are you a professional that needs to have expert knowledge continuously? In other words, do you need this enough to invest some money into optimizing this? send me a email with this whole dialogue. at mrferran1970@gmail.com

#

I think what Starlight is saying is that the ChatGPT user interface has built-in RAG system that you don't have to code. You can just upload your whole book PDF as 20 parts. If so, please tell us how that built in RAG system has worked out for your law book use case.