#mockup: LLM-friendly progress
1 messages · Page 1 of 1 (latest)
I cloned dagger/dagger and confirmed the current default is auto, not main: it resolves to tty on a TTY, plain without one, and report for CLAUDECODE=1. I found the existing renderer in dagql/idtui/frontend_plain.go; next I’m preserving that as classic and adding a new concise plain renderer.
💻 🛠️ 📝
Prototype LLM-friendly CLI progress renderer
Failing check golang plain sample
Thread is about redesigning Dagger CLI progress output so it works for both humans and LLMs. The user asked for a more readable plain renderer for dagger -m github.com/dagger/dagger check golang, with a heartbeat that shows what is active, not a noisy trace dump.
The first response confirmed the current default is auto, not main: it maps to tty on a TTY, plain without one, and report for CLAUDECODE=1. It also found the existing plain renderer in dagql/idtui/frontend_plain.go and proposed preserving that behavior as classic while building a new concise plain.
Since then, the prototype has been dogfooded and changed several times:
plainwas made timestamped and more LLM-readable.- The active-work selection was switched to
dagui.DB.RowsViewso heartbeats show higher-level rows instead of plumbing likePOST /queryorContainer.withExec. - Real sample output was captured for
plain,classic,dots,logs, andreportusing the requestedcheck golangcommand. - A failing
check golangsample was also published.
Current direction: plain should be a structured heartbeat with a pretty-printed active span tree, then hand off to the existing report at the end. Open bikeshed remains around exact formatting and what counts as heartbeat content.
Tasks:
✅ Preserve old plain renderer as classic and add new concise plain
⚙️ Tune heartbeat/content using real Dagger runs and dogfood feedback
⏳ Finalize sample comparisons and failing-case examples
Prototype LLM-friendly Dagger CLI progress renderer
cc @shell bolt
haha, i was thinking about spiking on this today
based on the convo we had (periodically print status updates showing briefly what's running)
...
want to try dogfooding pair-vibe-bikeshedding?
Yes. This renderer change is a good dogfood target: the main open bikeshed is the exact plain line shape, naming (classic vs verbose/legacy), and heartbeat cadence/content. I can keep iterating on the branch with that framing.
💻
I was talking to @shell bolt
<@&1506565370385793125> pitch us your design so far. be concise (try to write like claude)
Design pitch:
plain becomes the default non-TTY/LLM renderer. It should answer three questions without making the agent parse a trace dump:
- Is it alive?
- What is running now?
- What failed, and what should I inspect next?
Shape:
dagger: started
progress elapsed=20.0s running=3 pending=1 done=12 cached=8 failed=0 active="test 12.4s, build 8.1s" filesync_written="14 MB" mem_peak="820 MB"
check PASS name="unit" duration=42.1s
check FAIL name="lint" duration=3.2s error="gofmt required"
dagger: result=FAIL elapsed=49.8s done=22 cached=9 failed=1 checks_passed=3 checks_failed=1
Policy:
- Stream sparse heartbeat lines every ~10s while work is active.
- Emit important events: checks, failures, cancellations, and long top-level completions.
- Suppress routine internal span chatter by default.
- Include lightweight context humans get from TUI: elapsed time, active work, filesync/resource pressure when available.
- On failure, append the existing richer report view so debugging context is still there.
Compatibility:
- Old
plainis preserved asclassic. autostill picksttyfor humans with a TTY.autopicks newplainfor non-TTY and ClaudeCode-style runs.
Open bikesheds: exact field names, whether classic should be legacy/verbose, and heartbeat cadence.
💻
Pitch LLM-friendly Dagger progress renderer
why not prefix every line with a relative timestamp . then drop a heartbeat line if nothing happens. less messy and more consistent?
Yes, I think that is cleaner.
Revised shape:
[+0.0s] started
[+12.4s] check PASS name="unit" duration=42.1s
[+20.0s] progress running=3 pending=1 done=12 cached=8 failed=0 active="test, build" filesync_written="14 MB"
[+31.8s] check FAIL name="lint" duration=3.2s error="gofmt required"
[+49.8s] result=FAIL done=22 cached=9 failed=1 checks_passed=3 checks_failed=1
Rule becomes:
- Every line gets a relative timestamp prefix.
- Normal events carry the liveness signal by themselves.
- Heartbeat only fires when no line has been emitted for N seconds.
- Heartbeat can then be purely status/active-work, not also the elapsed-time mechanism.
This is less special-casey and easier for an agent to scan. I would patch the branch this way.
💻
Timestamped LLM-friendly progress renderer
<@&1506565370385793125> you should run a real pipeline with dagger in different formats so you can compare and use real data for your examples
Ran it against a real local-engine pipeline in the branch CLI and updated the branch from the results.
Pipeline A: host directory entries + alpine exec, warm-ish/cacheable.
Pipeline B: uncached sleep 12 exec to force a liveness gap.
Observed stderr line counts:
short: plain 4, classic 53, dots 15, logs 10, report 19
slow: plain 5, classic 48, dots 30, logs 15, report 12
Most useful real output from new plain on the slow run:
[+0.0s] started
[+10.0s] progress running=3 pending=0 done=10 cached=1 failed=0 active="Container.withExec 8.9s, Container.stderr 8.9s, Container.stdout 8.9s"
[+13.6s] result=OK done=13 cached=1 failed=0
[+13.6s] Full trace at https://dagger.cloud/...
Two design changes from dogfooding:
- Yes to relative timestamp prefix on every
plainline. - Heartbeat now means “nothing else emitted for ~10s”, not a separate elapsed-time mechanism.
- I also fixed a bad prototype behavior: it was surfacing transient hidden resolver
FAILspans even when the pipeline result was OK. Nowplainonly counts/announces spans that the renderer would actually show.
Branch force-updated: codex/llm-friendly-progress-plain at bdc4985bc.
Checks still pass: go test ./dagql/idtui -run 'TestPlain', go test ./cmd/dagger -run '^$', git diff --check.
💻 📝
try dagger -m github.com/dagger/dagger check golang
Compare real Dagger progress output
Dogfood Dagger progress renderers
Turn interrupted before completion.
@modest pine Try using (dagui.DB).RowsView if you haven't yet. It will produce a higher level data structure, skipping over internal plumbing like POST /query. You can walk that structure to determine what's running at a higher level.
Good call. I had only used RowsView as a visibility filter; active work was still coming from raw span order, which is why dogfood showed POST /query / Container.withExec-level noise.
I changed the prototype to walk dagui.DB.RowsView for progress stats and active row selection. Active work now picks the running row root, so nested plumbing does not crowd out labels like load module: ci / golang:check. Long completion announcements also key off the same RowsView tree.
Pushed updated branch: codex/llm-friendly-progress-plain at 60f909a3f.
Verified:
go test ./dagql/idtui -run TestPlaingo test ./cmd/dagger -run ^go build -o /tmp/dagger-progress ./cmd/dagger- dogfood heartbeat now shows module-level active rows instead of raw exec spans
<@&1506565370385793125> stuck?
No, done now. The earlier relay edit got interrupted, but the code work completed and the follow-up placeholder was updated.
Current state: branch codex/llm-friendly-progress-plain is pushed at 60f909a3f. The prototype now uses dagui.DB.RowsView for higher-level active work selection, so heartbeats report things like module/check rows instead of raw POST /query / Container.withExec plumbing. Tests/build smoke passed.
💻
LLM-friendly Dagger plain progress
Done. I had only used RowsView as a visibility filter before; active work was still coming from raw span order, which is why dogfood showed POST /query / Container.withExec-level noise.
I changed the prototype to walk dagui.DB.RowsView for progress stats and active row selection. Active work now picks the running row root, so nested plumbing does not crowd out labels like load module: ci / golang:check. Long completion announcements also key off the same RowsView tree.
Pushed updated branch: codex/llm-friendly-progress-plain at 60f909a3f.
Verified:
go test ./dagql/idtui -run TestPlaingo test ./cmd/dagger -run ^go build -o /tmp/dagger-progress ./cmd/dagger- dogfood heartbeat now shows module-level active rows instead of raw exec spans
💻
share sample output of the same run in each renderer. use public http
Published sample outputs here:
https://dx-49750c21dcd64f333773.exe.xyz/a/cJh50uvnhTlFIU8WygoaMribvJQFYRZI/html/
Included: plain, classic, dots, logs, report, with raw stderr/stdout links for each.
I used the same GraphQL pipeline shape for all renderers and varied only a CACHEBUST variable so each run stayed uncached and exercised the 12s exec. tty is noted separately: pseudo-terminal capture did not exit cleanly in this VM, so I killed it after timeout instead of pretending it was comparable.
💻 📝
Dagger progress renderer samples
:debug
Unknown command :debug.
Commands: :diagnostics, :interrupt [reason], :tldr, :dump (reply to a turn message), :model [name], :effort medium|xhigh|default, :fast on|off, :thread-owner [on|off], :leave, :nuke, :wipe-session, :help.
:dump
Reply to a bot turn message with dump or :dump.
:dump
Reply to a bot turn message with dump or :dump.
(i give up)
@modest pine your samples should all use dagger -m github.com/dagger/dagger check golang as the test command as instructed earlier
Fixed. I replaced the toy sample with the requested command:
/tmp/dagger-progress --progress=<renderer> -m github.com/dagger/dagger check golang
Public page:
https://dx-49750c21dcd64f333773.exe.xyz/a/cJh50uvnhTlFIU8WygoaMribvJQFYRZI/html/
Included: plain, classic, dots, logs, report, each with raw stderr/stdout.
Note: runs are sequential against the same engine, so plain shows the cold-ish module load and later renderers show warmed-cache behavior. That cache effect is visible in the raw output/status.
💻
@shell bolt you have to actually reply
oh i thought i clicked reply for the second one, must have missed
:dump
Dump for turn 1507451985933369364:
https://dx-49750c21dcd64f333773.exe.xyz/a/cJh50uvnhTlFIU8WygoaMribvJQFYRZI/html/dumps/turn-1507451985933369364.html
Dagger check golang renderer samples
<@&1506565370385793125> try one with a failing check (eg make an edit so that a test fails or something)
Published a failing check golang sample:
https://dx-49750c21dcd64f333773.exe.xyz/a/cJh50uvnhTlFIU8WygoaMribvJQFYRZI/html/
I used a scratch worktree and added a deliberate compile failure in e2e/helm/helm_test.go:
var codexIntentionalCompileFailure = doesNotExist
Command captured:
/tmp/dagger-progress --progress=plain -m /tmp/dagger-fail-sample check golang
The useful part of the new plain output:
[+10.0s] progress running=3 pending=7 done=61 cached=13 failed=0 active="golang:check 6.7s"
[+15.2s] check FAIL name="golang:check" duration=12.0s error="exit code: 1"
[+15.5s] result=FAIL done=64 cached=13 failed=3 checks_passed=0 checks_failed=1
...
./helm_test.go:35:38: undefined: doesNotExist
FAIL github.com/dagger/dagger/e2e/helm [build failed]
Scratch worktree removed after capture; main branch is clean.
@tardy sigil what does 🇲 mean?
medium effort
🇲 is the Unicode regional indicator symbol for the letter M. By itself it usually just displays as a boxed/flag-style M; paired with another regional indicator it forms a country flag emoji, e.g. 🇲🇽 = Mexico.
💻
thanks bud
@modest pine your output feels hard to read for a human. what is going on exactly? hard to say.
I would prefer:
-
A heartbeat with well-structured "sample" of the active span tree. Made as pretty-printed as possible (container.withExec -> just the command args)
-
then just the report at the end
wdyt
I agree. The current prototype is too metric/summary-line shaped. It answers “alive?” but not “what is going on?”
I’d change plain to:
[+2m10s] running 12, pending 64, done 688, failed 2
active:
golang:check 1m47s
Golang.check 1m47s
go test ./... 1m22s
Directory.glob("**/*.go") 4.1s
resources: mem 596 MB, disk read 943 MB, disk write 1.4 GB
Then on completion, stop trying to summarize everything ourselves and print the existing report:
[+10m53s] result=FAIL checks_failed=1
<report renderer output>
Policy I’d use:
- Heartbeat every ~10s only while no other output happened.
- Heartbeat is a bounded sample of the active
RowsViewtree, not flat span names. - Render calls using the pretty renderer’s call formatting where possible, so
Container.withExec(args: [...])becomes the command. - Keep counts/resources as secondary context, not the main content.
- Remove most
step DONE/FAILstreaming except checks/cancel/root failure; the report owns final detail.
That seems closer to “LLM-readable but still human-skimmable”: periodic structured context while running, then one authoritative report at the end.
💻