I built a benchmark this week to test so - guideboardlabs | Friends of the Crustacean 🦞🤝 | Page 1

worn ospreyBOT May 2, 2026, 3:26 PM

#

sand magnet May 3, 2026, 3:20 PM

#

fantastic, i'm about 5 pages into the paper. the claims resonate anecdotally with long running missions i do with agent-first coding. i also have a project building an agent orchestration layer that accumulates strongly-typed, structured knowledge of the mission, including context augmentation. i'm inspired by some of what you've shared.

potent widget May 3, 2026, 5:17 PM

#

what was the reason behind 'model not using the correctly retrieved long-term memory?'

@sand magnet are you doing anything to tackle that?

hardy vale May 3, 2026, 8:59 PM

#

sand magnet fantastic, i'm about 5 pages into the paper. the claims resonate anecdotally wit...

Feel free to plug in CAG and give it a try, the full source code is in the benchmark and my repositories other projects all use CAG (2 companion assistant applications GUI/TUI and an official CAG OpenClaw skill)

@potent widget I can DM you, this place has a 6 hour slowdown lol..

gray sparrow May 5, 2026, 11:56 AM

#

@hardy vale just skimmed through your paper, that's an interesting research. Do you think the root cause is the model failing to judge which facts in context actually matter — basically an attention/salience problem rather than a retrieval one? And since this was run on small local models via Ollama, how much do you expect the gap to close with frontier models like Claude or GPT-5?

unreal wind May 6, 2026, 4:25 AM

#

is the point of this code to test my exact openclaw setup for its variance to the outcomes of specific setup?

#I built a benchmark this week to test so - guideboardlabs