@inner wasp I want to spend tokens only when itâs actually useful, and avoid paying a constant âtaxâ every message.
2 modes Iâm aiming for:
1 Buddy mode (chitchat / life) This should feel like a friend who remembers me. It should distill durable personal facts/preferences and store them in memory, so I donât have to restate context. Peterâs example is exactly the vibe I want: after he mentioned surgery, the bot later checked in autonomously asking how it went.
2 Project mode. When Iâm working on something like a website/app, I want strict focus on that project context (plans, decisions, state). Personal/buddy memory should not be dragged into the project thread unless relevant.
Thatâs why Iâm considering a Telegram group with Topics: one âhangoutâ topic for buddy mode, and separate topics per project, each with a compact state/plan MD file.
Right now my setup feels expensive by default:
The baseline prompt is bloated (AGENTS.md = large + tool schemas = heavy)
AGENTS.md also instructs reading other .md files / daily logs, which can cause double-loading and cross-contamination later
I once burned ~45M tokens worth of credits doing nothing crazy: mostly chatting, asking it to remember things, and having it set up cron reminders. Thatâs way too fast.
So I want to redesign the setup so:
Always-injected context stays small
Memory is mostly retrieval-based (only relevant snippets injected)
Project isolation is reliable (no âProject A showing up in Project Bâ)
Question: Given your understanding of OpenClaw internals, what are the best ways to achieve this âsmart + cheapâ behavior? And do you see any creative alternatives beyond the obvious (trim AGENTS.md, reduce tool surface, move memory to retrieval-only files)?
Also: I originally assumed QMD would help because it indexes memory + sessions and returns only top-k reranked snippets. Is that assumption correct in practice for keeping costs down and relevance up or are there hidden gotchas?