I’ve noticed that the twins always forget a lot between streams. Would having something (probably an ai) that, once the stream ends, takes the entire transcript of the stream, selects what to remember, summarizes it, then puts it in to long term memory be possible?
This feels very similar to how human long term memory works.
Would it cause issues with them remembering things too often?
You could probably have another system to remove redundant memories
#Automatic long-term memory management
1 messages · Page 1 of 1 (latest)
I think just save the entire transcript, the more something gets mentioned the more likely it is for AI to remember anyways, maybe a few extra memories could be decided to be added tho (again maybe with AI)
They kind of already do tbh.
We don’t know how it all works but they memorize things and put stuff in their memory databank.
Then they pull information from their memory databank.
I think what they have right now requires them to decide to memorize something. They don’t memorize things automatically (long term) which leads to them forgetting pretty much everything. If they don’t remember what they have done already, they tend to repeat topics and never expand on them which I think gets a bit repetitive.
“have you heard of evil’s anthem?”
They have a long-term memory. They have memorized things from months before, but they don’t really understand the context.
I know they have long term memory, I’m saying that not enough gets remembered, so context and stuff gets lost
Ah yeah. Vedal is working on improving their memory.
Imagine if the only things you could remember long term were the things you specifically decide to remember
Ok
Yeah I agree, although I believe they should decide which of their memories takes priority, if they don’t do that already, which they might.
I can’t wait for them to be able to remember more things. I just discovered her right after the subathon but seeing their past growth still is really fun
Yeah definitely. Vedal is already working on upgrades, and memory may be a part of that. He wants them to be more intelligent, more consistent, have better memories, be able to run everything themselves without being easily influenced by others, and be more human.
Long term memory RAG systems are difficult to make for character based AI. It’s very easy to overwhelm/direct in weird ways, especially as the memory fills up.
Yeah, Vedal has managed to do it though. There's probably a lot of their memories being deleted to save space. I don't really know how he does it.
I know that their memories are often deleted though to make space for new ones.
Yeah. I do it by essentially marking memories as too recent. For example if a topic is close enough to something in the kvcache (think short term memory) it doesn’t get allowed to be remembered. I also use techniques where you reorder the history and insert fake messages into the queue in place of statements the ai made so it isn’t as strong an impression on the more recent response, but is available for the ai to use. I’m not saying this is what Vedal does, but it’s another technique,
Hm I think I need a bit more context in the discussion title
Automatic long-term memory management
Basically the concept is this:
- Store memories in a vector DB with sentence and/or paragraph embeddings.
- On new messages for user & LLM, store message into vector DB, creating the sentence embeddings and possibly keyword embeddings of major subjects (names, nouns, etc)
- On a user prompt do a semantic search of the prompt & previous messages (LLM and human) across not just the current prompt, but n number of messages before that, excluding those that were inserted already as memories.
- The best messages, those that have high relevance to the current conversation, insert those into near history… it would look sort of like,
- User: What do you remember?
- Assistant: I remember <memories that were extracted)
- It is important the memories are not the complete matches, but relevant string matches with possibly n number of sentences that were before/after for context
- The messages that had pretty close hits, but are not used, tag them in the DB as relevant messages.
- Then insert the actual new user prompt as a message
- Over time if you get close to the context max (this is a design for “short context” with long memory) you can wipe the KV Cache, insert those close but not exact hits as actual “recent” memory, insert relevant messages, and of course 2-4 of the actual most recent hits.
- Another design you can do to reduce prompt processing (that time you see really waiting for messages) with this “big dump” when cache is reset is to preemptively insert all of this before the user actually makes a prompt. This way when the user actually makes a prompt you’re only sending the latest message and they don’t see the prompt processing.