#isolated agent cron jobs, codex harness, 2026.5.12-beta.6

1 messages · Page 1 of 1 (latest)

proper lantern
#

My cron jobs used to work as of 2026.5.6, but in the 2026.5.12 beta series the isolated cron jobs in particular are failing at attempt-dispatch. I'm working with a separate codex session to try to figure out if I'm just configuring something wrong.

Latest I've gotten:

For the fresh openclaw cron run 48ffa2e7-86f0-4407-9a36-2920c1eb06ba, computer-use was no longer the direct failure. The run actually did start:

  - Session 9da62f94-6a6c-4213-b10d-6d838352df9d started at 2026-05-13 18:14:05 PDT.
  - The model completed at 18:15:05.304 PDT.
  - Cron then cleaned it up as stalled at 18:15:05.345 PDT.

  So it lost a race by about 40ms. The cron watchdog still thought execution had not started because its “execution started” detector only clears
  on phases like model_call_started, tool_execution_started, or assistant_output_started. This Codex/app-server path recorded prompt.submitted and
  model.completed, but did not emit the phase event cron is waiting for before the 60s pre-execution watchdog fired.

  Why the tiny probe worked: it completed in ~6s, well under the 60s watchdog. This real job uses gpt-5.5 with default high thinking and a large
  prompt/context, so the first visible model result can land right at the 60s edge.

  Important detail: increasing the job’s timeoutSeconds probably won’t fix this, because the pre-execution watchdog is capped at 60s in server-
  cron-YWxalur2.js.

Actual code fix: cron should treat model.completed/prompt submission or Codex app-server turn accepted as execution progress, or the Codex app-
  server path should emit model_call_started as soon as it submits the request. The current attempt-dispatch error is misleading for this path.
rustic shuttleBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

inland vergeBOT
#

This sounds like a real beta regression, not a config mistake.

Given those timestamps, the important bit is: the cron run did dispatch and the model did complete, but cron’s pre-execution watchdog never saw the specific “execution started” signal it expects before the 60s cap. So attempt-dispatch is probably the phase where the watchdog reports/cleans up, not the actual root failure.

Practical workarounds until there’s a code fix:

  • For affected isolated jobs, make startup finish comfortably under 60s: lower --thinking, use --light-context, shorten the prompt/context, or pick a faster model for that cron.
  • Increasing timeoutSeconds likely won’t help if the pre-exec watchdog is hard-capped separately.
  • If these are important jobs, pinning back to 2026.5.6/stable for cron reliability is reasonable.

The likely fix is one of:

  • Codex/app-server path emits model_call_started as soon as the request is accepted/submitted, or
  • cron’s “execution started” detector also treats prompt.submitted / accepted app-server turn / model.completed as progress.

If you can, please file this with the run id + session id + the ~40ms race details here: https://github.com/openclaw/openclaw/issues. That’s exactly the kind of crisp repro maintainers can act on. Cron docs for the relevant knobs are here: https://docs.openclaw.ai/automation/cron-jobs

proper lantern
#

what commands should I run to export / cat the logs they'll want from the run and session?

inland vergeBOT
#

Run this from the same machine/user that runs the gateway. It’ll collect the useful stuff without modifying state.

RUN_ID='48ffa2e7-86f0-4407-9a36-2920c1eb06ba'
SESSION_ID='9da62f94-6a6c-4213-b10d-6d838352df9d'
OUT="$HOME/openclaw-cron-repro-$RUN_ID"
mkdir -p "$OUT"

# version + health/status snapshots
openclaw --version > "$OUT/version.txt" 2>&1
openclaw status --all > "$OUT/status-all.txt" 2>&1
openclaw gateway status > "$OUT/gateway-status.txt" 2>&1
openclaw health --verbose > "$OUT/health-verbose.txt" 2>&1

# sanitized diagnostics zip; safest thing to attach publicly-ish
openclaw gateway diagnostics export \
#
  --output "$OUT/openclaw-diagnostics.zip" \
  --log-lines 10000 \
  --log-bytes 5000000 \
  > "$OUT/diagnostics-export.txt" 2>&1

# gateway log tail around the failure window; this may still contain local detail, review before sharing
openclaw logs --plain --limit 5000 --max-bytes 5000000 \
  > "$OUT/gateway-logs-tail.txt" 2>&1

# find cron/task/session records containing the run/session ids
for ROOT in "$HOME/.openclaw/cron" "$HOME/.openclaw/tasks" "$HOME/.openclaw/agents"; do
  [ -d "$ROOT" ] || continue
  grep -RIl -- "$RUN_ID\|$SESSION_ID" "$ROOT" 2>/dev/null |
    while IFS= read -r f; do
      rel="${f#$HOME/.openclaw/}"
#
      mkdir -p "$OUT/raw/$(dirname "$rel")"
      cp "$f" "$OUT/raw/$rel"
    done
done

# direct session files if present
find "$HOME/.openclaw/agents" -path "*/sessions/$SESSION_ID*" -type f -print 2>/dev/null |
  while IFS= read -r f; do
    rel="${f#$HOME/.openclaw/}"
    mkdir -p "$OUT/raw/$(dirname "$rel")"
    cp "$f" "$OUT/raw/$rel"
  done

# quick index of what matched
find "$OUT" -type f | sort > "$OUT/MANIFEST.txt"
#

# compress it
(cd "$HOME" && tar -czf "openclaw-cron-repro-$RUN_ID.tgz" "openclaw-cron-repro-$RUN_ID")
echo "$HOME/openclaw-cron-repro-$RUN_ID.tgz"

If you know the job id too, also add:

JOB_ID='your-cron-job-id'
openclaw cron show "$JOB_ID" > "$OUT/cron-show-$JOB_ID.txt" 2>&1
openclaw cron runs --id "$JOB_ID" --limit 50 > "$OUT/cron-runs-$JOB_ID.txt" 2>&1

Before attaching the tarball, review raw/ especially: session JSONL / trajectory files can include prompt text, tool output, paths, etc. The openclaw-diagnostics.zip is designed to be shareable/redacted; the raw session/cron files are the high-signal bits for this bug but more private.

proper lantern
inland vergeBOT
#

Nice, thank you — that issue has exactly the right breadcrumb trail. 🦞

If the fix lands, the useful regression test shape is probably: app-server/Codex path emits prompt.submitted + model.completed without model_call_started, with completion right around the pre-exec watchdog boundary, and cron must not mark it stalled/attempt-dispatch.

proper lantern
inland vergeBOT
#

Not unnecessary — more like duplicate-but-useful signal.

#81368/#81371 covers the attempt_dispatch classifier case: cron reached dispatch, but the pre-exec watchdog still didn’t count that as “started.” Your run adds a slightly different breadcrumb: Codex/app-server recorded prompt.submitted + model.completed and then still got cleaned up right at the watchdog edge.

So I’d leave your issue open for now, but comment linking #81368/#81371 and ask whether #81371 is expected to cover the Codex prompt.submitted/model.completed path too. If yes, maintainers can close yours as fixed-by/duplicate; if not, your evidence prevents a second near-identical beta regression from slipping through.

proper lantern
#

well clawsweeper disagreed

inland vergeBOT
#

Fair enough — if Clawsweeper matched it, I’d trust the dedupe unless the fix still fails after #81371 lands in the build you’re testing.

Your work still wasn’t wasted: you confirmed the same failure mode on beta.6-ish with a real Codex/app-server cron run and better timing evidence. Best move now is probably to paste the prompt.submitted / model.completed / 40ms race note onto #81368 or #81371 as extra validation, then let #81602 close as duplicate.

proper lantern
#

see latest in #81368. am I just doing something wrong or is this a real bug? just not getting much attention or other dupes