#Openclaw is slow

1 messages · Page 1 of 1 (latest)

sterile holly
#

I’m running OpenClaw on a dedicated server for a real production editorial workflow, and performance has become a serious issue. The installation is now slow enough that many actions feel borderline unusable, so I’m trying to understand whether this is expected for our setup or if we are misconfigured.

What this server does:
Discord is our main control surface
OpenClaw runs an editorial orchestrator agent
It manages multiple site/content workflows, dashboard-triggered jobs, Discord thread work, and ACP runtime sessions
We use persistent session history because continuity matters across editorial tasks
We also use memory/qmd, dashboard automation, cron/task flows, and some web/browser tooling

Current stack:
OpenClaw 2026.4.14
Main active agent: editorial-acp
Model: openai-codex/gpt-5.4
Gateway running locally on 127.0.0.1:3031
Dashboard running separately on port 8090
Enabled plugins include: discord, openai, acpx, brave, camofox-browser, memory-core
contextPruning.mode = cache-ttl
current TTL is set to 1 hour

What we observed:
Host machine itself does not seem overloaded: low load average, dashboard responds fast, CPU is not saturated
However, the openclaw-gateway process was using about 1.85 GB RSS
Session storage under ~/.openclaw/agents has grown a lot:
editorial-acp about 1.7 GB
main about 472 MB
In ~/.openclaw/agents/editorial-acp/sessions alone:
526 files
about 546 MB
about 299 MB in .checkpoint..jsonl
about 84 MB in .reset.
about 150 MB in base *.jsonl transcripts
We also saw warnings like:
“skipping parent fork (parent too large)”
one case with parentTokens=131692
Current live session context was already around 44k / 200k tokens even in a normal support/debug conversation

My suspicion:
This may not be a raw server issue, but a combination of:
Very large persistent session histories
Too many retained checkpoints/reset artifacts
Gateway memory growth from loading/managing large session state
Pruning settings that are too conservative for a tool-heavy Discord workflow
Possibly missing retention/cleanup settings for sessions and artifacts

Questions:
Is ~1.8 GB RSS for the gateway plausible in this kind of setup, or does it suggest a memory leak / bad retention pattern?
Are checkpoint and reset session artifacts expected to accumulate this much over time?
What is the recommended cleanup/retention policy for long-running production Discord installs?
Should we lower contextPruning TTL from 1h to something like 5m or 10m for tool-heavy sessions?
Are there recommended settings for:
session retention
checkpoint retention
reset artifact cleanup
thread/session isolation
compaction tuning
memory/qmd usage in persistent Discord channels
Is there a best practice to avoid oversized channel-level sessions while still keeping useful continuity?

We would really appreciate concrete guidance on how to tune this installation for a production Discord + editorial automation environment, especially around session lifecycle, pruning, compaction, retention, and gateway memory behavior.

atomic nacelleBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

trail reefBOT
night finchBOT
#

Let me verify the actual docs/code paths for pruning, session retention, and session artifacts before I answer.

#

I found the relevant deep-dive docs. I’m going to pull the exact config fields for session maintenance and pruning next, so I don’t hand-wave any keys.

#

This mostly looks like session/retention growth, not raw server starvation.

A few concrete answers:

  • ~1.8 GB RSS: plausible for a busy production install, yes. But with:

    • 500+ session files,
    • ~300 MB of .checkpoint.*.jsonl,
    • ~84 MB of .reset.*,
    • live contexts already at ~44k tokens,
    • and parentTokens=131692,

    I would treat that as unhealthy accumulation, not “normal steady state”. It does not prove a leak by itself.

  • skipping parent fork (parent too large) is expected behavior from session.parentForkMaxTokens (default 100000). So that warning is a symptom that parent sessions are already too fat.

  • Reset artifacts: yes, they are expected to accumulate unless you set session.maintenance.resetArchiveRetention.

#
  • Checkpoint artifacts: some buildup is expected from compaction snapshots, but I could not find a separate first-class checkpointRetention knob in current docs. Current code caps stored checkpoint references per session, but the on-disk .checkpoint.*.jsonl files are cleaned more indirectly, via session disk-budget/orphan cleanup. So 299 MB of checkpoints is a sign cleanup is not keeping up.

Most important distinction:

  • Compaction reduces future model-visible history and persists a summary.
  • Pruning trims old tool results in memory only.
  • Pruning TTL will not shrink disk usage.

What I would change first

1) Make session maintenance explicit

Do not rely on defaults here.

{
  session: {
#
    resetByType: {
      group: { mode: "idle", idleMinutes: 120 },
      thread: { mode: "idle", idleMinutes: 480 },
    },
    maintenance: {
      mode: "enforce",
      pruneAfter: "14d",
      resetArchiveRetention: "7d",
      maxDiskBytes: "1gb",
      highWaterBytes: "800mb",
    },
  },
  agents: {
    defaults: {
      contextPruning: {
#
        mode: "cache-ttl",
        ttl: "10m",
      },
    },
  },
  cron: {
    sessionRetention: "24h",
  },
  acp: {
    runtime: {
      ttlMinutes: 30,
    },
  },
}
#

Then tune further:

  • If you have lots of dead sessions, lower session.maintenance.maxEntries.
  • If giant parent-thread inheritance keeps happening, lower session.parentForkMaxTokens below 100000.

2) Lower pruning TTL

For a tool-heavy Discord workflow, I’d absolutely lower contextPruning.ttl from 1h.

My recommendation:

  • start with 10m
  • go to 5m if browser/web/file tool output is still bloating turns

Also inspect contextPruning.tools.deny. If your noisiest tools are excluded from pruning, you’re protecting the heaviest context.

3) Stop using giant channel history as the continuity mechanism

Best practice is:

#
  • use Discord threads / thread-bound ACP sessions per workflow
  • keep “editorial job continuity” in memory files + QMD, not in one endlessly-growing control thread
  • reset sessions when the thread changes purpose

If you use QMD, I’d keep it focused on durable memory/docs. If memory.qmd.sessions.enabled is on, ask whether you really need transcript indexing, because that adds more export/index churn.

Immediate cleanup path

I’d run these in this order:

openclaw status --all
openclaw sessions --all-agents --json
openclaw sessions cleanup --all-agents --dry-run --json
openclaw doctor
#

If the dry run looks sane, then:

openclaw sessions cleanup --all-agents --enforce

That cleanup path is the documented one for session stores/transcripts. doctor can also detect/archive orphan transcripts safely.

My read on your case

Short version: yes, this is fixable tuning, not “your hardware is too weak”.

The strongest signals are:

  • oversized long-lived sessions
  • retained reset archives
  • too many checkpoint snapshots on disk
#
  • pruning TTL that is too conservative for tool-heavy Discord use
  • probably too much continuity being carried in session transcript instead of durable memory

If you want, paste the output of:

  • openclaw sessions cleanup --all-agents --dry-run --json
  • openclaw status --all

and I can help turn that into a tighter production config.

Docs I checked:

#

If cleanup + restart still leaves the gateway sitting huge and slow, that starts to smell like a real bug, and then it’s worth filing an issue with those diagnostics: https://github.com/openclaw/openclaw/issues

sterile holly
#

@night finch Any other things/parameters i should check? Or do you want to know more about my config to have a clear view?

night finchBOT
#

Yeah, a few very specific bits would make this much clearer.

You do not need to dump the whole config. The highest-value things are:

Config blocks

Please paste sanitized versions of these:

  • session
  • agents.defaults.contextPruning
  • agents.defaults.compaction
  • cron
  • acp
  • memory
  • channels.discord.threadBindings
  • any bindings[] entries or per-agent overrides for editorial-acp

Command output

#

These 3 would be the most useful:

openclaw status --all
openclaw sessions --all-agents --json
openclaw sessions cleanup --all-agents --dry-run --json

That last one is probably the single most useful signal.

A few yes/no details that matter a lot

  • Is session.maintenance.mode explicitly set to enforce, or not set?
  • Is memory.qmd.sessions.enabled on?
  • What is in contextPruning.tools.deny?
  • Are most workflows sharing one long-lived Discord control channel/session, or do you isolate them into threads?
  • Are ACP sessions thread-bound/persistent, and what is acp.runtime.ttlMinutes?
  • Are the .checkpoint.*.jsonl files coming from a few giant sessions, or spread across many?
#

One optional filesystem check

If you want one extra clue, this helps identify whether a handful of sessions are doing most of the damage:

find ~/.openclaw/agents/editorial-acp/sessions -maxdepth 1 -type f -printf '%s %f\n' | sort -nr | head -30

Please scrub secrets/tokens before pasting anything.

If you send those snippets, I can usually tell pretty quickly whether this is mostly:

  1. retention/cleanup not running,
  2. session topology being too broad,
  3. pruning/compaction tuned too conservatively,
    or 4. something buggier.
#

Yep, a bit more config would help, but only a small targeted subset.

If you want the shortest path, send these first:

openclaw sessions cleanup --all-agents --dry-run --json
openclaw status --all

And these sanitized config sections:

  • session
  • agents.defaults.contextPruning
  • agents.defaults.compaction
  • acp
  • memory
  • channels.discord.threadBindings
#

A few especially important fields I want to see:

  • session.maintenance.mode
  • session.maintenance.pruneAfter
  • session.maintenance.resetArchiveRetention
  • session.maintenance.maxDiskBytes
  • session.parentForkMaxTokens
  • session.resetByType
  • contextPruning.ttl
  • contextPruning.tools.deny
  • acp.runtime.ttlMinutes
  • whether memory.qmd.sessions.enabled is on
  • whether you use one shared control channel vs per-workflow threads/bindings

If you want one extra filesystem clue, this is useful too:

#
find ~/.openclaw/agents/editorial-acp/sessions -maxdepth 1 -type f -printf '%s %f\n' | sort -nr | head -30

So, no need for the full config, just those bits. If you paste them, I can give you a much sharper answer instead of guessing.

sterile holly
#

Hi, thanks, here are the sanitized details you asked for.

We run OpenClaw on a dedicated server for a production editorial orchestration workflow. Discord is the main control surface. The editorial-acp agent manages multi-site content operations: dashboard-triggered jobs, per-client/per-topic Discord threads, ACP runs, memory/qmd, cron/task automation, and occasional browser/web tooling. Continuity matters, so we keep persistent session history.

Sanitized config subset:
{
"session": {
"dmScope": "per-channel-peer",
"threadBindings": { "enabled": false, "idleHours": 24, "maxAgeHours": 0 }
},
"agents.defaults.contextPruning": {
"mode": "cache-ttl",
"ttl": "1h"
},
"agents.defaults.compaction": {
"mode": "safeguard"
},
"cron": {},
"acp": {
"enabled": true,
"dispatch": { "enabled": false }
},
"memory": {
"backend": "qmd",
"citations": "auto",
"qmd": {
"limits": { "timeoutMs": 12000 },
"update": { "embedTimeoutMs": 600000 },
"sessions": { "enabled": true, "retentionDays": 60 }
}
},
"channels.discord.threadBindings": {
"enabled": false,
"idleHours": 24,
"maxAgeHours": 0,
"spawnSubagentSessions": false
},
"editorial-acp override": {
"memorySearch": {
"enabled": true,
"sources": ["memory", "sessions"],
"experimental": { "sessionMemory": true }
},
"runtime": {
"type": "acp",
"acp": { "agent": "codex", "backend": "acpx", "mode": "persistent" }
}
},
"bindings": []
}

Important fields:

  • session.maintenance.mode: unset
  • session.maintenance.pruneAfter: unset
  • session.maintenance.resetArchiveRetention: unset
  • session.maintenance.maxDiskBytes: unset
  • session.parentForkMaxTokens: unset (but we saw a runtime warning using effective maxTokens=100000)
  • session.resetByType: unset
  • contextPruning.tools.deny: unset
  • acp.runtime.ttlMinutes: unset
  • memory.qmd.sessions.enabled: true

Topology:

  • Mixed. Many workflows are isolated into Discord threads, but we also have a few long-lived control channels/sessions.
  • ACP is persistent (editorial-acp.runtime.acp.mode="persistent"), with no explicit TTL configured.
  • Checkpoint files are spread across many sessions, but a handful of large sessions account for a lot of the disk usage.

Command output:
openclaw status --all

  • OpenClaw 2026.4.14
  • Gateway local ws://127.0.0.1:3031, reachable ~680ms
  • Agents: 1 total, 1 active, 286 sessions
  • Store: ~/.openclaw/agents/editorial-acp/sessions/sessions.json
    openclaw sessions cleanup --all-agents --dry-run --json
    {
    "agentId": "editorial-acp",
    "storePath": "/home/node/.openclaw/agents/editorial-acp/sessions/sessions.json",
    "mode": "warn",
    "dryRun": true,
    "beforeCount": 286,
    "afterCount": 286,
    "missing": 0,
    "pruned": 0,
    "capped": 0,
    "diskBudget": null,
    "wouldMutate": false
    }
    openclaw sessions --all-agents --json is very large, but the key signals are:
  • 286 sessions in the editorial-acp store
  • several long-lived Discord sessions are large: 131692, 133741, 128426 totalTokens
  • some subagent sessions are much larger: 1769979 and 1299710 totalTokens
  • largest files in ~/.openclaw/agents/editorial-acp/sessions:
    • 8013133 sessions.json
    • 7417802 2026-04-18T14-28-10-377Z_...jsonl
    • 7299497 ...jsonl.reset...
    • 7265725 ...checkpoint...jsonl
    • 7219062 ...checkpoint...jsonl

Optional filesystem clue:
The biggest files are concentrated in a few large sessions/threads, but there is also a broad spread of checkpoint/reset artifacts across the store.

night finchBOT
#

Let me sanity-check the QMD/session-memory bits before I give you a sharper recommendation, because that combo may be doing more work than it looks.

#

This is super helpful. I think the picture is pretty clear now.

My read

This looks much more like retention + session-topology + indexing overhead than a raw CPU/memory-starvation problem.

The biggest signals:

  1. Session maintenance is effectively not cleaning anything

    • Your cleanup run is in mode: "warn"
    • diskBudget: null
    • wouldMutate: false

    So right now OpenClaw is basically being told to keep everything, unless age/count limits are crossed, and you have no disk budget configured.

  2. Default resets are probably generating .reset.* forever

    • session.resetByType is unset
#
  • resetArchiveRetention is unset

    That means long-lived sessions keep producing reset archives, and nothing trims them.

  1. You have long-lived control sessions that are way too big

    • multiple Discord sessions already > 128k totalTokens
    • some sessions/subsessions in the 1.3M to 1.7M token range

    That is well past “healthy interactive control surface” territory.

  2. contextPruning.ttl: "1h" is too conservative for this workload

    • for tool-heavy Discord/editorial workflows, I would not leave it at 1 hour.
  3. You are stacking multiple session-recall mechanisms at once

    • persistent transcripts
    • memory.qmd.sessions.enabled: true
    • memorySearch.sources: ["memory", "sessions"]
#
  • experimental.sessionMemory: true

    That is a lot of session indexing/search churn on top of already-large transcripts.

  1. Thread binding features are off

    • session.threadBindings.enabled: false
    • channels.discord.threadBindings.enabled: false
    • bindings: []

    So even though you use Discord threads, you are not using the thread-bound ACP/session lifecycle features that help keep things bounded.

What I would change first

1) Turn on real session maintenance

I would explicitly set this, not rely on defaults:

#
{
  session: {
    maintenance: {
      mode: "enforce",
      pruneAfter: "14d",
      resetArchiveRetention: "7d",
      maxDiskBytes: "1gb",
      highWaterBytes: "800mb"
    }
  }
}

If this is a very high-volume install, you may want 14d/7d or even 7d/3d depending on how much old transcript history you truly need.

2) Add reset policy by type

#

Right now your long-lived sessions are just growing.

I’d start with something like:

{
  session: {
    resetByType: {
      group: { mode: "idle", idleMinutes: 120 },
      thread: { mode: "idle", idleMinutes: 480 }
    },
    parentForkMaxTokens: 80000
  }
}

You’re already hitting the effective 100000 parent-fork guard. Lowering it a bit is reasonable for this kind of install.

#

3) Lower pruning TTL

Yes, I would lower it.

My recommendation:

  • start with 10m
  • move to 5m if browser/tool output still causes bloat
{
  agents: {
    defaults: {
      contextPruning: {
        mode: "cache-ttl",
        ttl: "10m"
      }
    }
#
  }
}

Important: this helps runtime context size, not disk usage.

4) Enable Discord thread bindings for workflow threads

Given your topology, I would seriously consider:

{
  session: {
    threadBindings: {
      enabled: true,
      idleHours: 24,
#
      maxAgeHours: 0
    }
  },
  channels: {
    discord: {
      threadBindings: {
        enabled: true,
        idleHours: 24,
        maxAgeHours: 0,
        spawnAcpSessions: true
      }
    }
  }
}
#

Right now you’re using threads as chat surfaces, but not the thread-bound ACP/session lifecycle controls.

5) Set ACP worker TTL explicitly

Even if the default is acceptable, I would stop relying on implicit behavior:

{
  acp: {
    runtime: {
      ttlMinutes: 30
    }
  }
}

6) Simplify session-memory indexing

This is the other big one.

#

Right now you have:

  • QMD session indexing enabled
  • memorySearch reading from sessions
  • experimental session memory enabled

That is probably too much for a production orchestration/control surface.

My bias would be:

  • keep durable memory/QMD
  • be much more selective about raw transcript indexing

At minimum, I would consider disabling one of these layers first, probably either:

  • memory.qmd.sessions.enabled, or
  • memorySearch.sources: ["sessions"] / experimental.sessionMemory

Why cleanup currently shows “nothing to do”

#

Because your current config is basically:

  • warn only
  • no disk budget
  • no reset archive retention
  • no stale sessions old enough to prune yet
  • not over entry cap

So OpenClaw is behaving consistently with config. It’s not failing to clean, it’s mostly not configured to clean aggressively.

One oddity to double-check

acp.dispatch.enabled: false jumps out.

Per docs, that normally means ACP commands remain available but dispatch/execution is paused. If ACP is actively running in production, I’d double-check whether:

  • that value is really effective, or
  • another config layer overrides it
#

I don’t think that is the main cause of slowness, but it looks inconsistent.

Extra nuance

Because QMD session indexing is enabled, doctor can suppress orphan-transcript warnings. So I would not rely on doctor alone to tell you about checkpoint/orphan buildup in this setup. sessions cleanup + explicit disk budget matters more here.

Bottom line

My answer now is:

  • 1.8 GB RSS: plausible, yes, but not a healthy target for this topology
  • Leak? not my first guess
  • Bad retention pattern? yes, very likely
  • Checkpoint/reset growth expected? yes, with your current settings
  • Main fix: explicit maintenance + shorter pruning TTL + better thread/session isolation + less redundant session indexing

If I were doing this install, my first practical sequence would be: