I found that not all models produce actionable or useful active-memory results. Unless you have /verbose and /trace enabled, you'll never know, and your active memory spits out 'empty' or 'NONE', and basically isn't being used.
I found that GLM-5.1 works really well (both z.ai direct model, and ollama's). Please bear in mind that ollama has a concurrency limit of 3, while z.ai stupidly limits everything to a concurrency of 1 (ie no concurrency). So if you have z.ai set as both your conversational model, and assigned to active-memory, well, it means that it will time out and throw up rate-limit errors.
This leads you to think that you could assign GLM-4.5-flash or GLM-4.7-flash to active memory ... except neither of those work and won't summarize anything (result: empty/NONE).
So, keep that in mind - and if you don't show verbose and trace, you'll never even know it fails.