#Automatic long-term memory management

1 messages · Page 1 of 1 (latest)

primal quarry
#

I’ve noticed that the twins always forget a lot between streams. Would having something (probably an ai) that, once the stream ends, takes the entire transcript of the stream, selects what to remember, summarizes it, then puts it in to long term memory be possible?
This feels very similar to how human long term memory works.
Would it cause issues with them remembering things too often?
You could probably have another system to remove redundant memories

grand niche
#

I think just save the entire transcript, the more something gets mentioned the more likely it is for AI to remember anyways, maybe a few extra memories could be decided to be added tho (again maybe with AI)

ashen valley
#

They kind of already do tbh.

#

We don’t know how it all works but they memorize things and put stuff in their memory databank.

#

Then they pull information from their memory databank.

primal quarry
# ashen valley They kind of already do tbh.

I think what they have right now requires them to decide to memorize something. They don’t memorize things automatically (long term) which leads to them forgetting pretty much everything. If they don’t remember what they have done already, they tend to repeat topics and never expand on them which I think gets a bit repetitive.
“have you heard of evil’s anthem?”

ashen valley
primal quarry
ashen valley
primal quarry
#

Imagine if the only things you could remember long term were the things you specifically decide to remember

ashen valley
primal quarry
#

I can’t wait for them to be able to remember more things. I just discovered her right after the subathon but seeing their past growth still is really fun

ashen valley
delicate lynx
#

Long term memory RAG systems are difficult to make for character based AI. It’s very easy to overwhelm/direct in weird ways, especially as the memory fills up.

ashen valley
#

I know that their memories are often deleted though to make space for new ones.

delicate lynx
#

Yeah. I do it by essentially marking memories as too recent. For example if a topic is close enough to something in the kvcache (think short term memory) it doesn’t get allowed to be remembered. I also use techniques where you reorder the history and insert fake messages into the queue in place of statements the ai made so it isn’t as strong an impression on the more recent response, but is available for the ai to use. I’m not saying this is what Vedal does, but it’s another technique,

primal quarry
#

Hm I think I need a bit more context in the discussion title

#

Automatic long-term memory management

delicate lynx
#

Basically the concept is this:

  1. Store memories in a vector DB with sentence and/or paragraph embeddings.
  2. On new messages for user & LLM, store message into vector DB, creating the sentence embeddings and possibly keyword embeddings of major subjects (names, nouns, etc)
  3. On a user prompt do a semantic search of the prompt & previous messages (LLM and human) across not just the current prompt, but n number of messages before that, excluding those that were inserted already as memories.
  4. The best messages, those that have high relevance to the current conversation, insert those into near history… it would look sort of like,
    1. User: What do you remember?
    2. Assistant: I remember <memories that were extracted)
    3. It is important the memories are not the complete matches, but relevant string matches with possibly n number of sentences that were before/after for context
  5. The messages that had pretty close hits, but are not used, tag them in the DB as relevant messages.
  6. Then insert the actual new user prompt as a message
  7. Over time if you get close to the context max (this is a design for “short context” with long memory) you can wipe the KV Cache, insert those close but not exact hits as actual “recent” memory, insert relevant messages, and of course 2-4 of the actual most recent hits.
  8. Another design you can do to reduce prompt processing (that time you see really waiting for messages) with this “big dump” when cache is reset is to preemptively insert all of this before the user actually makes a prompt. This way when the user actually makes a prompt you’re only sending the latest message and they don’t see the prompt processing.