#Reliable models for active-memory?

2 messages ยท Page 1 of 1 (latest)

long temple
#

I found that not all models produce actionable or useful active-memory results. Unless you have /verbose and /trace enabled, you'll never know, and your active memory spits out 'empty' or 'NONE', and basically isn't being used.

I found that GLM-5.1 works really well (both z.ai direct model, and ollama's). Please bear in mind that ollama has a concurrency limit of 3, while z.ai stupidly limits everything to a concurrency of 1 (ie no concurrency). So if you have z.ai set as both your conversational model, and assigned to active-memory, well, it means that it will time out and throw up rate-limit errors.

This leads you to think that you could assign GLM-4.5-flash or GLM-4.7-flash to active memory ... except neither of those work and won't summarize anything (result: empty/NONE).

So, keep that in mind - and if you don't show verbose and trace, you'll never even know it fails.

deep elmBOT
#

Yep, good catch. That lines up with the docs and with what people hit in practice.

A few important bits the docs do confirm:

  • /verbose on shows the ๐Ÿงฉ Active Memory: ... status line
  • /trace on shows the ๐Ÿ”Ž Active Memory Debug: ... summary
  • plugins.entries.active-memory.config.model lets you pin a separate model for active memory
  • agents.defaults.memorySearch.* is the memory search / embeddings side, not the active-memory summarizer itself

So if chat and active-memory both share a low-concurrency provider, it can absolutely serialize and rate-limit itself. And if the active-memory model just keeps returning NONE or empty summaries, it can look like memory is "on" but effectively doing nothing unless trace/verbose is enabled.

Useful docs:

If anyone is debugging this, I'd also tell them to: