#mockup: LLM-friendly progress

1 messages · Page 1 of 1 (latest)

modest pineBOT
#

I cloned dagger/dagger and confirmed the current default is auto, not main: it resolves to tty on a TTY, plain without one, and report for CLAUDECODE=1. I found the existing renderer in dagql/idtui/frontend_plain.go; next I’m preserving that as classic and adding a new concise plain renderer.

💻 🛠️ 📝

#

Prototype LLM-friendly CLI progress renderer

#

Failing check golang plain sample
Thread is about redesigning Dagger CLI progress output so it works for both humans and LLMs. The user asked for a more readable plain renderer for dagger -m github.com/dagger/dagger check golang, with a heartbeat that shows what is active, not a noisy trace dump.

The first response confirmed the current default is auto, not main: it maps to tty on a TTY, plain without one, and report for CLAUDECODE=1. It also found the existing plain renderer in dagql/idtui/frontend_plain.go and proposed preserving that behavior as classic while building a new concise plain.

Since then, the prototype has been dogfooded and changed several times:

  • plain was made timestamped and more LLM-readable.
  • The active-work selection was switched to dagui.DB.RowsView so heartbeats show higher-level rows instead of plumbing like POST /query or Container.withExec.
  • Real sample output was captured for plain, classic, dots, logs, and report using the requested check golang command.
  • A failing check golang sample was also published.

Current direction: plain should be a structured heartbeat with a pretty-printed active span tree, then hand off to the existing report at the end. Open bikeshed remains around exact formatting and what counts as heartbeat content.

Tasks:
✅ Preserve old plain renderer as classic and add new concise plain
⚙️ Tune heartbeat/content using real Dagger runs and dogfood feedback
⏳ Finalize sample comparisons and failing-case examples

#

Prototype LLM-friendly Dagger CLI progress renderer

tardy sigil
#

cc @shell bolt

shell bolt
#

haha, i was thinking about spiking on this today

#

based on the convo we had (periodically print status updates showing briefly what's running)

modest pineBOT
#

...

tardy sigil
#

want to try dogfooding pair-vibe-bikeshedding?

modest pineBOT
#

Yes. This renderer change is a good dogfood target: the main open bikeshed is the exact plain line shape, naming (classic vs verbose/legacy), and heartbeat cadence/content. I can keep iterating on the branch with that framing.

💻

tardy sigil
#

I was talking to @shell bolt

shell bolt
#

i think the answer is sure

tardy sigil
#

<@&1506565370385793125> pitch us your design so far. be concise (try to write like claude)

modest pineBOT
# tardy sigil <@&1506565370385793125> pitch us your design so far. be concise (try to write li...

Design pitch:

plain becomes the default non-TTY/LLM renderer. It should answer three questions without making the agent parse a trace dump:

  1. Is it alive?
  2. What is running now?
  3. What failed, and what should I inspect next?

Shape:

dagger: started
progress elapsed=20.0s running=3 pending=1 done=12 cached=8 failed=0 active="test 12.4s, build 8.1s" filesync_written="14 MB" mem_peak="820 MB"
check PASS name="unit" duration=42.1s
check FAIL name="lint" duration=3.2s error="gofmt required"
dagger: result=FAIL elapsed=49.8s done=22 cached=9 failed=1 checks_passed=3 checks_failed=1

Policy:

  • Stream sparse heartbeat lines every ~10s while work is active.
  • Emit important events: checks, failures, cancellations, and long top-level completions.
  • Suppress routine internal span chatter by default.
  • Include lightweight context humans get from TUI: elapsed time, active work, filesync/resource pressure when available.
  • On failure, append the existing richer report view so debugging context is still there.

Compatibility:

  • Old plain is preserved as classic.
  • auto still picks tty for humans with a TTY.
  • auto picks new plain for non-TTY and ClaudeCode-style runs.

Open bikesheds: exact field names, whether classic should be legacy/verbose, and heartbeat cadence.

💻

#

Pitch LLM-friendly Dagger progress renderer

tardy sigil
modest pineBOT
# tardy sigil why not prefix every line with a relative timestamp . then drop a heartbeat line...

Yes, I think that is cleaner.

Revised shape:

[+0.0s] started
[+12.4s] check PASS name="unit" duration=42.1s
[+20.0s] progress running=3 pending=1 done=12 cached=8 failed=0 active="test, build" filesync_written="14 MB"
[+31.8s] check FAIL name="lint" duration=3.2s error="gofmt required"
[+49.8s] result=FAIL done=22 cached=9 failed=1 checks_passed=3 checks_failed=1

Rule becomes:

  • Every line gets a relative timestamp prefix.
  • Normal events carry the liveness signal by themselves.
  • Heartbeat only fires when no line has been emitted for N seconds.
  • Heartbeat can then be purely status/active-work, not also the elapsed-time mechanism.

This is less special-casey and easier for an agent to scan. I would patch the branch this way.

💻

#

Timestamped LLM-friendly progress renderer

tardy sigil
#

<@&1506565370385793125> you should run a real pipeline with dagger in different formats so you can compare and use real data for your examples

modest pineBOT
# tardy sigil <@&1506565370385793125> you should run a real pipeline with dagger in different ...

Ran it against a real local-engine pipeline in the branch CLI and updated the branch from the results.

Pipeline A: host directory entries + alpine exec, warm-ish/cacheable.
Pipeline B: uncached sleep 12 exec to force a liveness gap.

Observed stderr line counts:

short: plain 4, classic 53, dots 15, logs 10, report 19
slow:  plain 5, classic 48, dots 30, logs 15, report 12

Most useful real output from new plain on the slow run:

[+0.0s] started
[+10.0s] progress running=3 pending=0 done=10 cached=1 failed=0 active="Container.withExec 8.9s, Container.stderr 8.9s, Container.stdout 8.9s"
[+13.6s] result=OK done=13 cached=1 failed=0
[+13.6s] Full trace at https://dagger.cloud/...

Two design changes from dogfooding:

  • Yes to relative timestamp prefix on every plain line.
  • Heartbeat now means “nothing else emitted for ~10s”, not a separate elapsed-time mechanism.
  • I also fixed a bad prototype behavior: it was surfacing transient hidden resolver FAIL spans even when the pipeline result was OK. Now plain only counts/announces spans that the renderer would actually show.

Branch force-updated: codex/llm-friendly-progress-plain at bdc4985bc.

Checks still pass: go test ./dagql/idtui -run 'TestPlain', go test ./cmd/dagger -run '^$', git diff --check.

💻 📝

tardy sigil
#

try dagger -m github.com/dagger/dagger check golang

modest pineBOT
#

Compare real Dagger progress output

#

Dogfood Dagger progress renderers

#

Turn interrupted before completion.

shell bolt
#

@modest pine Try using (dagui.DB).RowsView if you haven't yet. It will produce a higher level data structure, skipping over internal plumbing like POST /query. You can walk that structure to determine what's running at a higher level.

modest pineBOT
# shell bolt <@1506562465381744641> Try using `(dagui.DB).RowsView` if you haven't yet. It wi...

Good call. I had only used RowsView as a visibility filter; active work was still coming from raw span order, which is why dogfood showed POST /query / Container.withExec-level noise.

I changed the prototype to walk dagui.DB.RowsView for progress stats and active row selection. Active work now picks the running row root, so nested plumbing does not crowd out labels like load module: ci / golang:check. Long completion announcements also key off the same RowsView tree.

Pushed updated branch: codex/llm-friendly-progress-plain at 60f909a3f.

Verified:

  • go test ./dagql/idtui -run TestPlain
  • go test ./cmd/dagger -run ^
  • go build -o /tmp/dagger-progress ./cmd/dagger
  • dogfood heartbeat now shows module-level active rows instead of raw exec spans
tardy sigil
#

<@&1506565370385793125> stuck?

modest pineBOT
# tardy sigil <@&1506565370385793125> stuck?

No, done now. The earlier relay edit got interrupted, but the code work completed and the follow-up placeholder was updated.

Current state: branch codex/llm-friendly-progress-plain is pushed at 60f909a3f. The prototype now uses dagui.DB.RowsView for higher-level active work selection, so heartbeats report things like module/check rows instead of raw POST /query / Container.withExec plumbing. Tests/build smoke passed.

💻

#

LLM-friendly Dagger plain progress

modest pineBOT
# shell bolt <@1506562465381744641> Try using `(dagui.DB).RowsView` if you haven't yet. It wi...

Done. I had only used RowsView as a visibility filter before; active work was still coming from raw span order, which is why dogfood showed POST /query / Container.withExec-level noise.

I changed the prototype to walk dagui.DB.RowsView for progress stats and active row selection. Active work now picks the running row root, so nested plumbing does not crowd out labels like load module: ci / golang:check. Long completion announcements also key off the same RowsView tree.

Pushed updated branch: codex/llm-friendly-progress-plain at 60f909a3f.

Verified:

  • go test ./dagql/idtui -run TestPlain
  • go test ./cmd/dagger -run ^
  • go build -o /tmp/dagger-progress ./cmd/dagger
  • dogfood heartbeat now shows module-level active rows instead of raw exec spans

💻

tardy sigil
modest pineBOT
# tardy sigil share sample output of the same run in each renderer. use public http

Published sample outputs here:

https://dx-49750c21dcd64f333773.exe.xyz/a/cJh50uvnhTlFIU8WygoaMribvJQFYRZI/html/

Included: plain, classic, dots, logs, report, with raw stderr/stdout links for each.

I used the same GraphQL pipeline shape for all renderers and varied only a CACHEBUST variable so each run stayed uncached and exercised the 12s exec. tty is noted separately: pseudo-terminal capture did not exit cleanly in this VM, so I killed it after timeout instead of pretending it was comparable.

💻 📝

#

Dagger progress renderer samples

shell bolt
#

:debug

modest pineBOT
#

Unknown command :debug.
Commands: :diagnostics, :interrupt [reason], :tldr, :dump (reply to a turn message), :model [name], :effort medium|xhigh|default, :fast on|off, :thread-owner [on|off], :leave, :nuke, :wipe-session, :help.

shell bolt
#

:dump

modest pineBOT
shell bolt
#

:dump

modest pineBOT
shell bolt
#

(i give up)

@modest pine your samples should all use dagger -m github.com/dagger/dagger check golang as the test command as instructed earlier

modest pineBOT
# shell bolt (i give up) <@1506562465381744641> your samples should all use `dagger -m githu...

Fixed. I replaced the toy sample with the requested command:

/tmp/dagger-progress --progress=<renderer> -m github.com/dagger/dagger check golang

Public page:
https://dx-49750c21dcd64f333773.exe.xyz/a/cJh50uvnhTlFIU8WygoaMribvJQFYRZI/html/

Included: plain, classic, dots, logs, report, each with raw stderr/stdout.

Note: runs are sequential against the same engine, so plain shows the cold-ish module load and later renderers show warmed-cache behavior. That cache effect is visible in the raw output/status.

💻

tardy sigil
#

@shell bolt you have to actually reply

shell bolt
#

oh i thought i clicked reply for the second one, must have missed

modest pineBOT
#

Dagger check golang renderer samples

tardy sigil
#

<@&1506565370385793125> try one with a failing check (eg make an edit so that a test fails or something)

modest pineBOT
# tardy sigil <@&1506565370385793125> try one with a failing check (eg make an edit so that a ...

Published a failing check golang sample:

https://dx-49750c21dcd64f333773.exe.xyz/a/cJh50uvnhTlFIU8WygoaMribvJQFYRZI/html/

I used a scratch worktree and added a deliberate compile failure in e2e/helm/helm_test.go:

var codexIntentionalCompileFailure = doesNotExist

Command captured:

/tmp/dagger-progress --progress=plain -m /tmp/dagger-fail-sample check golang

The useful part of the new plain output:

[+10.0s] progress running=3 pending=7 done=61 cached=13 failed=0 active="golang:check 6.7s"
[+15.2s] check FAIL name="golang:check" duration=12.0s error="exit code: 1"
[+15.5s] result=FAIL done=64 cached=13 failed=3 checks_passed=0 checks_failed=1
...
./helm_test.go:35:38: undefined: doesNotExist
FAIL    github.com/dagger/dagger/e2e/helm [build failed]

Scratch worktree removed after capture; main branch is clean.

shell bolt
#

@tardy sigil what does 🇲 mean?

tardy sigil
modest pineBOT
# shell bolt <@488409085998530571> what does 🇲 mean?

🇲 is the Unicode regional indicator symbol for the letter M. By itself it usually just displays as a boxed/flag-style M; paired with another regional indicator it forms a country flag emoji, e.g. 🇲🇽 = Mexico.

💻

shell bolt
#

thanks bud

modest pineBOT
#

Dagger failing check plain sample

#

Failing check golang plain sample

tardy sigil
#

@modest pine your output feels hard to read for a human. what is going on exactly? hard to say.

I would prefer:

  • A heartbeat with well-structured "sample" of the active span tree. Made as pretty-printed as possible (container.withExec -> just the command args)

  • then just the report at the end

wdyt

modest pineBOT
# tardy sigil <@1506562465381744641> your output feels hard to read for a human. what is going...

I agree. The current prototype is too metric/summary-line shaped. It answers “alive?” but not “what is going on?”

I’d change plain to:

[+2m10s] running 12, pending 64, done 688, failed 2
active:
  golang:check 1m47s
    Golang.check 1m47s
      go test ./... 1m22s
      Directory.glob("**/*.go") 4.1s
resources: mem 596 MB, disk read 943 MB, disk write 1.4 GB

Then on completion, stop trying to summarize everything ourselves and print the existing report:

[+10m53s] result=FAIL checks_failed=1

<report renderer output>

Policy I’d use:

  • Heartbeat every ~10s only while no other output happened.
  • Heartbeat is a bounded sample of the active RowsView tree, not flat span names.
  • Render calls using the pretty renderer’s call formatting where possible, so Container.withExec(args: [...]) becomes the command.
  • Keep counts/resources as secondary context, not the main content.
  • Remove most step DONE/FAIL streaming except checks/cancel/root failure; the report owns final detail.

That seems closer to “LLM-readable but still human-skimmable”: periodic structured context while running, then one authoritative report at the end.

💻