#I built a benchmark this week to test so - guideboardlabs

1 messages ยท Page 1 of 1 (latest)

worn ospreyBOT
sand magnet
#

fantastic, i'm about 5 pages into the paper. the claims resonate anecdotally with long running missions i do with agent-first coding. i also have a project building an agent orchestration layer that accumulates strongly-typed, structured knowledge of the mission, including context augmentation. i'm inspired by some of what you've shared.

potent widget
#

what was the reason behind 'model not using the correctly retrieved long-term memory?'

@sand magnet are you doing anything to tackle that?

hardy vale
gray sparrow
#

@hardy vale just skimmed through your paper, that's an interesting research. Do you think the root cause is the model failing to judge which facts in context actually matter โ€” basically an attention/salience problem rather than a retrieval one? And since this was run on small local models via Ollama, how much do you expect the gap to close with frontier models like Claude or GPT-5?

unreal wind
#

is the point of this code to test my exact openclaw setup for its variance to the outcomes of specific setup?