#I built a benchmark this week to test so - guideboardlabs
1 messages ยท Page 1 of 1 (latest)
fantastic, i'm about 5 pages into the paper. the claims resonate anecdotally with long running missions i do with agent-first coding. i also have a project building an agent orchestration layer that accumulates strongly-typed, structured knowledge of the mission, including context augmentation. i'm inspired by some of what you've shared.
what was the reason behind 'model not using the correctly retrieved long-term memory?'
@sand magnet are you doing anything to tackle that?
Feel free to plug in CAG and give it a try, the full source code is in the benchmark and my repositories other projects all use CAG (2 companion assistant applications GUI/TUI and an official CAG OpenClaw skill)
@potent widget I can DM you, this place has a 6 hour slowdown lol..
@hardy vale just skimmed through your paper, that's an interesting research. Do you think the root cause is the model failing to judge which facts in context actually matter โ basically an attention/salience problem rather than a retrieval one? And since this was run on small local models via Ollama, how much do you expect the gap to close with frontier models like Claude or GPT-5?
is the point of this code to test my exact openclaw setup for its variance to the outcomes of specific setup?