#New chat api (gpt turbo 3.5)
47 messages · Page 1 of 1 (latest)
You have to use the assistant that's what that is for so you send previous conversation to the assistant so it knows what you have been talking about. To send it with every message and you need to do something to maximize the amount of prompt space you can use so tag prior conversations and only send conversations that are relevant to the question or something like this to increase how much you can remember and work with.
Doesnt that use tokens all the time
and how is that a chat if it doesn't work like in the web or liek chatgpt
I am wondering that too, chatGPT isn't saving by itself each conversation under some unique ID for some time at least?
It's a no brainer to send everything from the conversation with each new question, that gradually increase tokens that are spent
exactly what I mean when you want to make a application that uses this chat feature in some way you'll always have to compute the whole text thus increasing cost extreamly
Im assuming that they will add fine tuning to gpt turbo later
Why do you mention fine tuning, if I understood well, that is 'training' of the model for your needs, and I only want to have a chats with it?
whats the purpose to have a chat when it doesnt work like a chat
How do you mean it doesn't work like a chat? I want it to work like regular chatGPT works - that is regular chat, I guess?
Well you can manipulate fine tuning to give chat context. If there's a message that your bot makes that's good, you write a system to add that message to the fine tuning model as a good example and give context to important messages.
The ChatGPT in the web browser defiantly have some sort of fine tuned model that can create context no matter the length of the chat
yeah but the normal chats works that the ai has access to previous comments. And can give anwsers relative to it. For example taking reference to a bigger text. You would have to pay for every api request
but you cant access context with the chat mode with the api
Yes. because the Quizlet, Snapchat, and Instacart examples wouldn't work with the character limit they have. They must have a fine tuned or context
but you cannot access the context without sending the whole previous text every time which at some point will cost large amounts of tokens
Yes, and eventually you will hit the token limit
I've thought of some work arounds like removing old message from the message list but that sacrifices context. I've thought of a system that i have to manually set chats as important so they don't get erases in the context.
Or fine tuning a model that works for my needs.
but still when trying to make a competitive system it will just be worse than chatgpt
I don't intend to reinvent the wheel, rather implement GPT technology into existing programs. and I think that's their goal with GPT Turbo
but if you want to creat a webapp or so using the api where you want to have context for better results it just simply isn't possible with gpt Turbo there is literally no benefit over DaVinci except cost
Hey all!
all these systems like quizlet, snapchat, etc work either with a fixed context memory that sits under the 4000 token limit, or they use embeddings to create memory permanence
Check this out! https://towardsdatascience.com/generative-question-answering-with-long-term-memory-c280e237b144
Thanks @keen wadi
No problem! Let me know if you have more questions in here, I've done a lot of work with memory/conversational bots and can help out
Epic
Thanks
how would this work with gpt 3.5 turbo and does this mean in chat gpt for example not everythin is saved
This works the same with GPT 3.5 turbo, it has the same context limit
for ChatGPT, not really sure, but they likely call their own API with their own conversation state system
but like for example couldn't embedding take aways crucial information?
ChatGPT is a fine tuned GPT-3 model
But yeah you would want to store conversation context in the prompt and with embeddings you can make long term memory
can you explain that with the embedding please i dont quite understand it. Like yeah you save it but at some point you will have to call it or hwo
Yeah embeddings are used to store information in numerical vector stores which captures stuff like semantic meaning of the text. Then you can query those embeddings using semantic search.
If you're using them for long term memory, every time you ask a question it would find the most relevant parts of your chat history (stored in embeddings) and input that information into the prompt, essentially mimicking long term model memory.
Lets say i every query should be bou d to a text and should only use the text as source. Wouldnt that mean that embedding doesnt work anymore
What do you mean?
You would have to embed that text source, in chunks of text
and then search through those embeddings to find the most relevant bits
which means the text i would give to the api is shorter and thus costs less?
Yes that's correct, but at the loss of lower accuracy since it doesn't have knowledge of the entire text at any given point in time if it's longer than 4096 tokens
so only do it if the text is very long got it