#How to configure openclaw to shape answers better

1 messages ¡ Page 1 of 1 (latest)

vapid geyser
#

I asked my openclaw to look into local models I could run, and it returned a huge wall of text, which is pretty much the opposite of what I want. it felt like just a default chatgpt model response. what configuration should i check to shape the responses better ?

vague heronBOT
#

The “wall of text / generic ChatGPT vibe” is usually one of these in OpenClaw:

1) Your workspace prompt files aren’t telling it to be concise

Most response style comes from what’s injected every turn from your agent workspace:

  • AGENTS.md (high-level operating rules)
  • SOUL.md (tone/personality)

First check what workspace your gateway is actually using:

openclaw config get agents.defaults.workspace

Then add a very explicit brevity rule (I’d put it in AGENTS.md, because it also applies to sub-agents; see #3):

Example snippet:

#
## Response style (important)
- Default to concise: 5–10 lines max.
- Start with a 1–2 line TL;DR.
- Ask up to 3 clarifying questions *before* long explanations.
- Use bullets, no long paragraphs unless I ask “go deep”.

Docs: https://docs.openclaw.ai/concepts/agent-workspace

2) You might have verbose / reasoning / high thinking enabled

These can make replies feel much longer/noisier.

In the chat where it happens, send (as standalone messages):

  • /verbose off
  • /reasoning off
#
  • /think off (or /think low)

You can also set defaults in config:

{
  agents: {
    defaults: {
      verboseDefault: "off",
      thinkingDefault: "low" // or "off"
    }
  }
}

Docs: https://docs.openclaw.ai/tools/thinking

#

3) If it used a sub-agent, your SOUL might not be applied

Sub-agents currently only get AGENTS.md + TOOLS.md (not SOUL.md / USER.md etc). So if the “local models research” was delegated to a sub-agent, it can come back sounding generic unless your style rules live in AGENTS.md.

Docs: https://docs.openclaw.ai/tools/subagents


If you paste:

  1. openclaw --version
  2. the top of your AGENTS.md (just the style section)
    …we can sanity-check that the gateway is reading the files you think it is, and that defaults are actually being applied.
vapid geyser
#

OpenClaw 2026.3.8 (3caab92)

my agents.md file appears to be pretty much default. only diff since installing has been to also use tasks.md file

vague heronBOT
#

On 2026.3.8, if your AGENTS.md is basically default, then the main knobs to “shape” answers are:

1) Put “brevity rules” into AGENTS.md (not just SOUL.md)

Reason: if OpenClaw spawns a sub-agent for research, sub-agents only get AGENTS.md + TOOLS.md (they do not get SOUL.md). So style rules that live only in SOUL.md won’t apply to those “look into X” tasks.

Drop something like this near the top of AGENTS.md:

## Response style (important)
- Default: concise. 5–10 lines max unless I ask to go deep.
- Start with a 1–2 line TL;DR.
- Ask up to 3 clarifying questions before long explanations.
- Prefer bullets + concrete next steps over essays.
- If the topic is broad (e.g. “local models”), give a short shortlist + tradeoffs, then ask what constraints matter.

That alone usually fixes the “wall of text” problem.

#

2) Make sure verbose/reasoning/thinking aren’t turned on for that session

In the thread where it got chatty, send these as standalone messages:

  • /verbose off
  • /reasoning off
  • /think low (or /think off)

Those settings can “stick” per session, so even if your config is fine, an old session can stay noisy.

Docs: https://docs.openclaw.ai/tools/thinking

3) Your tasks.md probably isn’t being read at all

OpenClaw auto-injects specific workspace files (AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, BOOTSTRAP.md, plus memory.md/MEMORY.md if present). A random tasks.md won’t affect behavior unless:

  • you explicitly tell it “read tasks.md”, or
  • you add a rule in AGENTS.md like “when I ask about tasks, open tasks.md”.
#

If you want “task list” behavior, consider using HEARTBEAT.md (for periodic nudges) or memory.md (for durable preferences).

4) Optional hard cap: lower the model output limit (maxTokens)

If you want a hard ceiling on response length, you can set maxTokens in your model params (either in the model catalog or per-agent params). This is more blunt (it can truncate mid-answer), but it works.

If you want, paste:

  • openclaw models status (just the “current model id” line), and
  • your top ~30 lines of AGENTS.md

…and I’ll suggest an exact minimal config/snippet tailored to your setup.

vapid geyser
#

my AGENTS.md is default so you can just see whats in yours

vague heronBOT
#

I can’t paste “my” AGENTS.md from this bot instance (it’s customized + not something I want to dump into a public Discord thread), but the good news is: the upstream default template is public, and it’s exactly what you have.

Default template (what “default AGENTS.md” means):

What I’d actually change to fix your “wall of text” issue is tiny. Add this near the very top of AGENTS.md (important: AGENTS, not only SOUL, because sub-agents don’t get SOUL):

## Response style (important)
- Default to concise: 5–10 lines max.
- Start with a 1–2 line TL;DR.
- If the request is broad/underspecified, ask up to 3 clarifying questions BEFORE dumping options.
- Use bullets. No long paragraphs unless I ask “go deep”.
- Offer to expand: “Want the long version?”
#

Optional (since you mentioned tasks.md): tasks.md is not auto-loaded. If you want it to matter, add:

## Tasks file
- If I ask about tasks/todos, read `tasks.md` first.
- Don’t read `tasks.md` for unrelated questions.

After editing, I’d do a quick /new in the chat/thread where you’re testing, just to clear any session-level /verbose or /think overrides.

vapid geyser
#

if i talk to my openclaw in telegram, and do a /new, at what point do any important memories get saved ?

vague heronBOT
#

/new doesn’t trigger any special “save important memories” step.

What gets saved when:

  1. Transcript / chat history
    Every turn is written to the session store basically immediately (so /new won’t erase what happened — it just starts a new session key).

  2. Durable “memory” (the Markdown files in your workspace)
    Those only change when the agent explicitly writes to them (e.g. memory/YYYY-MM-DD.md or MEMORY.md). That typically happens in two cases:

  • You ask for it (“please remember X / write this to memory”).
  • Automatic pre-compaction memory flush: if the session gets long enough that it’s nearing auto-compaction, OpenClaw runs a silent “write durable memories now” turn (you usually won’t see it). Docs: https://docs.openclaw.ai/concepts/memory

So if you tend to do /new a lot (short sessions), you’ll often never hit the auto-flush threshold — meaning the safe workflow is:

Before /new: “Write the key takeaways/preferences from this chat to memory, then tell me you did.”

If you want this to happen reliably, add a rule to AGENTS.md like: “When the user says ‘remember this’, always write it to memory/YYYY-MM-DD.md (and to MEMORY.md if it’s a durable preference).”

vapid geyser
#

back to previous questions now. so we dont duplicate and confuse the agent. should we utilize SOUL.md file ? Im ok if only main agent uses this for now. we can always customize subagents later. cause i see soul has this section: Vibe

Be the assistant you'd actually want to talk to. Concise when needed, thorough when it matters. Not a corporate drone. Not a sycophant. Just... good.

vague heronBOT
#

Yes — SOUL.md is exactly the right place to put that “vibe / how you should sound” guidance for your main agent.

But: if your original “wall of text” sometimes came from a sub-agent, note that sub-agents don’t see SOUL.md (they only get AGENTS.md + TOOLS.md). So the practical split I recommend is:

Use SOUL.md for

  • personality / tone / boundaries (“not corporate”, “not sycophantic”)
  • high-level writing principles (“concise by default, thorough when asked”)

Use AGENTS.md for

  • hard response-format rules you want applied everywhere (including sub-agents), e.g. length limits, bullet preference, “ask clarifying questions first”, etc.

That way you don’t duplicate/confuse: SOUL sets the vibe, AGENTS enforces the format.

Minimal, low-duplication pattern

In SOUL.md (keep it high level):

#
## Vibe
Be concise by default. If the question is broad, ask clarifying questions before dumping options.
No corporate fluff. No long essays unless the user asks to go deep.

In AGENTS.md (make it enforceable):

## Response style (applies to main + sub-agents)
- Default: 5–10 lines max.
- Start with a 1–2 line TL;DR.
- Use bullets; avoid long paragraphs.
- If underspecified: ask up to 3 clarifying questions first.
- Offer an “expand?” option instead of preemptively expanding.

After editing, it should apply on the next message (no restart needed). If you want to be extra sure you’re not carrying session overrides like /verbose on, do a quick /new in the chat you’re testing.

vapid geyser
#

yeah i checked verbose and and those other options are all off

vague heronBOT
#

Good — if /verbose, /reasoning, and thinking are all off, then the “wall of text” is just the model doing what it thinks is helpful.

Next things to check (in order):

1) Confirm the chat is using the agent/workspace you edited

In the same Telegram/Discord chat where it got verbose, run:

  • /status

You’re looking for the agent id + workspace path (and just sanity-check it still says verbose/thinking off). If Telegram is routed to a different agent/workspace than Discord (bindings / multi-agent), edits won’t carry over.

2) Verify the model is actually reading your style rules

A quick test message:

“What are my response-style rules from AGENTS.md? Quote them verbatim, then answer in that format.”

If it can’t quote them (or quotes old content), you’re editing the wrong file/path or the gateway is pointed elsewhere.

#

3) If you want a guaranteed cap, set maxTokens for your model

This is the “hard limiter” when you really want short replies.

  1. Get the exact current model id from /status (or openclaw models status)
  2. Add a maxTokens param for that model (example structure):
{
  agents: {
    defaults: {
      models: {
        "provider/model": {
          params: { maxTokens: 600 }
        }
      }
    }
#
  }
}

(600–1200 is usually a decent range; too low can truncate mid-sentence.)

If you paste the /status output (just the lines showing agent/model/workspace), I can tell you exactly where to drop the maxTokens in your config without guesswork.

vapid geyser
#

my openclaw tweaked it a bit, does that look fine ? ## Response Style

Applies to main + sub-agents unless the user asks otherwise, or the task/channel clearly needs more detail.

  • Default: 5–10 lines max.
  • Start with a 1–2 line TL;DR.
  • Use bullets; avoid long paragraphs.
  • If underspecified: ask up to 3 clarifying questions first.
  • Offer an "expand?" option instead of preemptively expanding.
vague heronBOT
#

Yep — that’s a solid “concise-by-default” block, and it’s phrased the right way (applies to main + sub-agents, with an escape hatch when you ask otherwise).

A couple tiny tweaks I’d consider:

  1. Make the override rule explicit
    Right now “task/channel clearly needs more detail” is a bit subjective. If you want it stricter, change to something like:
  • “Only go longer if the user explicitly asks (‘go deep’, ‘details’, ‘long version’) or if you must include safety/step-by-step instructions.”
  1. Add a “ask before going long” line
    This is the best wall-of-text killer:
  • “If the answer won’t fit in 10 lines, ask whether to expand before writing the long version.”
  1. (Optional) Add a format cap
    E.g. “Max 8 bullets” or “No paragraphs longer than 2 lines.”

If you paste that near the top of AGENTS.md, you’re in great shape.

vapid geyser
#

didnt go exactly as planned, received this after /new when asked about local models, it just made everything a huge list.

Short version: on an M2 / 24GB machine, the sweet spot is 7B–14B local models as your always-on sidekick, with maybe a quantized ~27B/32B as an occasional heavier option if you’re okay with slower output. For OpenClaw, local is most useful for cheap/high-volume tasks, while frontier cloud models still handle the hard reasoning and bigger coding jumps.

• What fits your machine well
• Best practical tier: 7B–14B models
• Possible but less comfy: 27B/32B quantized models, likely slower and less pleasant if you’re multitasking
• Rule of thumb from multiple Mac/local-LLM guides: on Apple Silicon, you want to leave ~4–8GB for macOS + other apps, so a 24GB machine is not really a “run huge models all day” box
• Best candidates for your setup
• Qwen 3 14B — strongest general-purpose “local brain” pick I found for 24GB-class Macs
• Qwen 2.5 Coder 14B — very solid if your local workload is code-heavy
• Qwen 2.5 Coder 7B — faster, lighter, great for utility coding / boilerplate / grep-and-patch style help
• Gemma 3 12B — worth testing as a compact all-rounder for summarization / classification / drafting
• DeepSeek-R1-Distill 14B — useful if you specifically want a small local reasoning-flavored model, but I wouldn’t automatically pick it over Qwen for day-to-day OpenClaw work
• What local models are actually useful for in OpenClaw
• Summarizing logs, docs, transcripts, tickets
• Rewriting / cleaning text
• Classification / routing (“is this a bug / task / reminder / spam / follow-up?”)
• Cheap code chores:
• regex transforms
• simple file edits
• test failure triage
• commit message drafting
• changelog generation
• docstring / README cleanup
• Private or offline tasks where you don’t want to burn API spend
• What I would keep on cloud frontier models
• Multi-file refactors
• Architecture decisions
• Hard debugging
• Ambiguous product/QA reasoning
.....

vague heronBOT
#

Yeah — that reply is basically the failure mode you’re trying to prevent: it did a TL;DR, but then it went into “exhaustive helpful list” mode instead of stopping + asking what you want.

Two fixes work well here:

1) Make the response-style rules hard limits, not “defaults”

Right now you have “Default: 5–10 lines max” (which the model treats as optional). Change it to something it can self-check:

## Response Style (hard rules)
Applies to main + sub-agents unless the user explicitly asks otherwise.

Hard limits:
- Max 10 lines OR max 8 bullets (whichever comes first).
- No “big lists”. Give at most 3 options/examples unless I ask for more.
- If the question is underspecified, ask up to 3 clarifying questions FIRST and stop.
#
Output format:
- TL;DR (1–2 lines)
- Clarifying questions (0–3)
- Shortlist (max 3 bullets)
- Ask: “Want the expanded list?”
Before sending: if you exceed limits, rewrite shorter.

That “before sending: rewrite shorter” line helps a lot.

2) Add a tiny special-case template for “local models”

Because “what local models should I run?” is almost always underspecified, you can force it into a short interactive flow:

### When asked about local models
#
Treat as underspecified unless the user gives constraints.
Reply with:
1) 2–3 clarifying questions (runner: Ollama/LM Studio, main use-case, speed vs quality)
2) A shortlist of 3 picks max
3) Ask if they want the longer matrix/list

If it still ignores you

That can happen with some models (especially local ones) being less instruction-following. At that point, the practical backstop is a hard output cap via maxTokens for the active model.

If you want, paste the /status output from that Telegram chat (just the model + agent/workspace lines) and I’ll tell you exactly where to set maxTokens in your config.

vapid geyser
#

🦞 OpenClaw 2026.3.8 (3caab92)
🧠 Model: openai-codex/gpt-5.4 · 🔑 oauth (openai-codex:default)
🧮 Tokens: 12k in / 1.5k out
🗄️ Cache: 58% hit · 18k cached, 0 new
📚 Context: 23k/272k (8%) · 🧹 Compactions: 0
📊 Usage: 5h 89% left ⏱2h 16m · Week 93% left ⏱6d 3h
🧵 Session: agent:main:telegram:direct:1543187345 • updated just now
⚙️ Runtime: direct · Think: low
🪢 Queue: collect (depth 0)

vague heronBOT
#

That /status is perfect — you’re on:

  • Agent: main (Telegram DM session)
  • Model: openai-codex/gpt-5.4
  • Thinking: low
  • Last answer: ~1.5k output tokens (so the “huge list” is literally just the model choosing to be verbose)

Given that, you’ve got two good levers:

A) Tighten AGENTS.md so it can’t interpret “helpful” as “exhaustive”

Add hard caps + a forced structure (this is the part your current block is missing):

## Response Style (hard rules)
Applies to main + sub-agents unless I explicitly ask otherwise.
#
Hard limits:
- Max 8 bullets OR 10 lines (whichever comes first).
- Never output “long lists” by default (max 3 options/examples).
- If underspecified: ask up to 3 clarifying questions FIRST and stop.

Required structure:
- TL;DR (1–2 lines)
- Clarifying questions (0–3)
- Shortlist (max 3 bullets)
- “Want the expanded list?”
Before sending: if you exceed limits, rewrite shorter.

And add a mini-template for this specific question type:

#
### When asked about local models
Ask 2–3 questions first (runner + use-case + speed/quality).
Then give max 3 picks.

B) Put a hard ceiling on output length (maxTokens)

Since your model is openai-codex/gpt-5.4, you can clamp it so it physically can’t emit 1500 tokens.

Try something like 700 to start:

openclaw config set 'agents.defaults.models["openai-codex/gpt-5.4"].params.maxTokens' 700 --strict-json
openclaw config get 'agents.defaults.models["openai-codex/gpt-5.4"].params.maxTokens'
openclaw gateway restart
#

Notes:

  • This is a blunt instrument: if you ask for “go deep”, it may truncate. If that bothers you, keep the cap higher (e.g. 1000–1200) and rely more on the hard AGENTS.md structure.

If you paste your current Response Style block after you update it, I’ll sanity-check it for loopholes (the “huge list” behavior almost always comes from one or two ambiguous escape hatches).

vapid geyser
#

I dont really like any of these options. I just want to guide my openclaw to be a good assistant, not give too many of these specific rules. can i find any good examples for agents and soul files to see what we could add to improve it that are well tested

vague heronBOT
#

Totally fair. If you don’t like “hard rules,” the best pattern is: a few strong principles + 2–3 concrete examples (few-shot) in SOUL.md, and keep AGENTS.md mostly operational.

“Well-tested” starting points (official)

OpenClaw ships a couple templates that are meant to be reasonable defaults:

A good “minimal but effective” thing to add (principles + examples)

Instead of bullet-count limits, add something like this to SOUL.md:

#
## Communication principles
- Default to a short first answer + the next best question.
- Don’t be exhaustive unless I explicitly ask for a deep dive.
- Prefer: shortlist + tradeoffs + “want more?” over giant lists.
- If the request is underspecified, ask clarifying questions before researching.

Then add two tiny examples right below it, e.g. one for “local models” and one for “debugging,” showing the shape you want. Models follow examples surprisingly well without needing strict caps.

Community examples

If you want “real people’s configs that feel good in practice,” ask in #users-helping-users — folks often share sanitized AGENTS.md/SOUL.md snippets or repos there. (Way better than me guessing what vibe you’ll like.)

If you tell me what you want the assistant to feel like (e.g. “terse engineer”, “calm executive assistant”, “friendly but brief”), I can propose a small SOUL.md add-on (like ~15–25 lines + 2 examples) that nudges behavior without turning into a rulebook.

vapid geyser
#

i asked my openclaw, after we reverted the change, and poor fella still writes with bulletpoints. hopefully after /new that goes away. but guess he will lose all memory about importance of this ? when i switch contexts, is using /new not a good option? how comes memory saving isnt tied to /new ?

heres the answer i got now when i asked where he would put the instructions:

Best place: SOUL.md, with a short backup rule in AGENTS.md if you want it enforced more broadly.
I’d split it like this:
SOUL.md → tone/personality rule
“Be concise by default.”
“For research, give summary first, then key bullets, then offer expansion.”
AGENTS.md → operating rule
“Default to short answers unless user asks for depth.”
“For research tasks, avoid long essays; lead with conclusions and only include supporting detail that changes the decision.”
Why this split works
SOUL.md shapes how I sound
AGENTS.md shapes how I execute each session
MEMORY.md is less ideal for this, because that’s more long-term facts/preferences than core operating behavior
What I’d add
In SOUL.md:
“Default to concise, decision-useful answers.”
“No long essays unless asked.”
“For research: summary first, 3–7 bullets, then ‘expand if you want.’”
In AGENTS.md:
“Response style: concise by default.”
“Research responses should prioritize conclusions, tradeoffs, and next steps over exhaustive background.”
If you want, I can draft 2–4 clean lines for each file and apply them.

vague heronBOT
#

/new is “start a fresh session context”, not “end this session gracefully”. So a couple things follow from that:

1) /new won’t make it forget your files

If you put behavior guidance in AGENTS.md / SOUL.md, that’s persistent across /new because those files are re-injected every turn. /new only drops the chat transcript context, not your workspace.

So if you reverted the bullet rule in AGENTS/SOUL, /new should stop any “I’m currently in bullet-mode because of earlier context” effect — but the bot may still choose bullets sometimes because the default templates already encourage bullets on chat platforms.

2) Will it “lose memory about the importance of this” after /new?

It will lose the in-chat context unless it was written to disk.

There are only two durable places:

  • Operating behavior → AGENTS.md / SOUL.md (best for “how you should answer”)
  • Facts/preferences → MEMORY.md / memory/YYYY-MM-DD.md

If you care that it remembers “I hate huge lists / don’t do bullet spam”, put that in SOUL/AGENTS (not MEMORY).

3) Is /new a bad option for switching contexts?

#

No — it’s the right tool when you want a clean slate (switching topics, avoiding runaway context, etc.). It’s intentionally cheap and immediate.

4) Why isn’t memory saving tied to /new?

A few design reasons:

  • /new is handled by the gateway, not the model — it’s meant to work even if the model is down, rate-limited, or you want an instant reset.
  • Cost + surprise — if /new triggered an LLM turn (“summarize + write memories”), every reset would burn tokens.
  • Safety/privacy — auto-writing “important memories” at reset time is risky in group contexts; OpenClaw prefers memory writes to be explicit (or the pre-compaction flush, which happens only when needed).

Practical workflow (what most people do)

Before you reset, send something like:

“Before I /new: summarize in 3 lines and write any durable preference from this to memory.”

Then send /new.

If you tell me what you want instead of bullets (short paragraphs? one-liners + a question?), I can suggest 2–3 lines for SOUL that nudges that without turning into a rulebook.

vapid geyser
#

once i update agents and soul md files, from chat in control ui, do i just type /new for it to reload or is there a different way