#hard-to-read errors in shell/cloud
1 messages ยท Page 1 of 1 (latest)
looks like the data is in the right place (all of these spans have the right error_origin), but:
- I still need to update Cloud to dedupe the same errors
dagger shellmight be a blind spot, will investigate
@shell jetty can you copy/paste the whole TUI output, if it's still around?
ah nvm i can derive it from your repro, no worries, pardon the ping
no worries
I was going to try with dagger call
looks like call shows the error correctly
oof yeah shell looks awful
mmm, dagger -c also shows it...
The problem is specifically when I run the command interactively in the shell
yeah seeing the same. i'm surprised it's auto-collapsing, too
I haven't checked if the rendering issue in cloud is only triggered when running the shell interactively
update: just checked. It's indeed only triggered from interactive shell
Web trace when run from -c: https://dagger.cloud/dagger/traces/0f95423497ee678d40876ef03e18ed1e
figured out the shell portion - for some reason it was setting reveal + passthrough which effectively hid all the other spans 
here it is with that commented out:
that's here: https://github.com/dagger/dagger/blob/e035126279e30913831033156f0b9d544873f896/cmd/dagger/shell_exec.go#L199-L200
cc @spring oasis (sorry didn't realize you were PTO) - maybe it's just trying to set passthrough=false and the reveal=true is unintentional?
reveal is more intense, maybe it should be called focus or something instead. worth bikeshedding in https://github.com/dagger/dagger/pull/9327
related explanation: https://github.com/dagger/dagger/pull/9327#discussion_r2192779217
@hidden star I meant to leave a follow-up comment in that status PR, but: I find reveal and passthrough to be super dense and confusing, as a naive non-god-tier developer ๐
Since the intent seems to be getting around pollution by noisy spans, wouldn't it make sense to have an API to mark those noisy spans instead? Eg. a debug or system or just noisy argument? Then other spans/statuses don't have to declare anything special
that was what i tried originally, but it becomes hard to apply rules in the UI unilaterally to them. for example, sometimes you want to hide a system span entirely including its children, and sometimes it's just an intermediary thing and you're still interested in what's beneath it
down to bikeshed this some more, but yeah, eventually I landed on just letting the attributes explicitly control what the UI does, rather than try to have the UI interpret things semantically, where I fear we might come up with about 3 different words to express 'this is internal[, but don't hide its children][, but show it when it fails][, ...]'
BUT. naming is hard, and I named these quickly to get the job done and without anyone judging them ๐
i agree they might be a bit too brainiac / compositional
i should probably at the very last mark the API experimental
we could also turn them into implementation details of higher level APIs, gradually as users request them, and just not expose them for now
PR for the shell fix: https://github.com/dagger/dagger/pull/10698
@winter elm another trace where I had to spelunk in the web viewer (was run in CI): https://dagger.cloud/dagger/traces/846c8a5e69123a3cfb27a86102a63768?listen=d0efcd23c08784ab&listen=91acd858e309f09a&listen=238ea34dba4ca25c&listen=4e6a8767a95eb94e
did you try the 'error' button?
also the logs were hoisted up to the very top already?
-
I didn't see the ERROR button, it sends me to a span further in the stack, but that span doesn't have the error message either
-
Ah yes, the error log is at the bottom of the page, but before I expand the trace, the bottom was pretty close (just below the fold in my case). I just saw a red span and immediately read its contents, then clicked on it to drill down once I didn't find what I was looking for
Clicking ERROR brought me here - maybe it should auto-expand the target? (had a similar thought before but wasn't 100% sure)
for 2. - yeah, it's at the bottom because normally the top part is less tall, and because it's heuristic-driven sometimes it hoists less-relevant stuff up and risks pushing the trace body way too far down
maybe we can build on this new error origin tracking mechanism to get rid of that heuristic now 
Or maybe it stays since sometimes the hoisting is very helpful (if e.g. a bunch of integration tests failed you'll see all their spans and logs). And we just need the 100% certain root cause error to be shown up top