#Eval: go-programm-qa

1 messages ยท Page 1 of 1 (latest)

rapid gull
#

๐Ÿงต

#

(cc @daring snow in case you're interested)

#

Issues in the first run:

#
  1. Regression in the withExec error propagation: the agent is not getting stderr anymore
#
  1. Model is way more verbose than before. Might be a model upgrade, maybe I'm getting gpt 4.5? Note to self: pin model version ๐Ÿ™‚
#
  1. There's definitely something wrong on the "qa" part: prompts and model replies don't appear. I see "naked container.withExec but I don't know where they're coming from (I gave a high-level Workspace object). This one could be a mix of TUI + regression in my module
hardy relic
#

also it'd be really helpful to link to the traces for these

rapid gull
#

Since BBI itself is still in active development, and it's not an immutable truth that tool calls & dagger function calls will map so perfectly 1-1 (even now it's not 100% perfect mapping, there is the flattening trick to support chaining etc) -> I think keeping the ๐Ÿค–๐Ÿ’ป span would be useful

hardy relic
#

i think that'll hinder readability by breaking the illusion of chaining, which is kind Dagger's bread and butter

#

since it's passthrough it'll always reveal whatever spans it ran, even if it's not 1:1

#

but, keeping it on the radar anyway. unfortunately bringing back the "thinking" phase also breaks chaining, since in reality there's an API roundtrip inbetween all those

#

maybe with a different BBI it'd be submitting fully chained calls instead of doing one at a time

rapid gull
#

Oh I see, the issue is the auto-chaining?

#

Ha ha that kind of brings back the topic of function / artifact views ๐Ÿ™‚ That's a problem specific to the trace view (not to sidetrack - obviously we need all this to work great with the trace view)

hardy relic
#

yeah, the fact that they chain in the UI right now is dependent on those spans appearing directly to adjacent to one another, which currently works because of the passthrough trick - if we wrap them in another non-passthrough span, that'll go away

rapid gull
#

Is there a quick fix for the stderr pass through?

#

Or a repro?

#

I have a demo in 10mn

#

(just realized)

rapid gull
#

Observation: all the spans are there in web UI. seems like a TUI issue that some spans are not visible

hardy relic
hardy relic
# rapid gull Is there a quick fix for the stderr pass through?

nothing quick, sorry - it might be the case that it doesn't work when a module does the container sync, and only works if the model directly calls Container.withExec.[something]. nothing should have regressed there, I can see it working with the repro above:

llm | with-container $(container | from alpine | with-exec sh --stdin 'echo sdfsdfsdfsdf; exit 1') | with-prompt "you are in the context of a container that printed a message and exited 1. what was printed to stdout?" | loop | with-prompt "show me the raw tool call result that you read that from" | loop
rapid gull
#

@hardy relic repro:

.cd github.com/dagger/agents/toy-programmer/toy-workspace
llm | with-toy-workspace $(write main.go 'wrong') | with-prompt "call the build tool and report back with exactly the result you received" | last-reply
#

@daring snow I have to join my call (swyx...) but in case you're still around: could you push a workaround to toy-workspace/main.go? basically add expect: any to withExec in build() so that stderr is in the error? ๐Ÿ™

#

otherwise my demo will bomb

#

(I will work around it)