Telemetry with links | Dagger | Page 1

wooden rain May 15, 2025, 10:16 AM

#

Hey @tribal crypt! 👋 How would you represent in telemetry the relationship between commands in a pipeline, like:

directory | with-file | entries

The handlers for these start concurrently but get blocked on reading stdin before the previous handler has finished (and thus closed its stdout). In this example with-file should fail due to missing arguments, in which case I want entries to be marked as canceled.

I’m trying to use the trace.WithLinks() to link one handler to the previous, but it’s been tough trying to get it all to work as I want it so looking for advice because I may be pushing against the current.

#

This is what I have working, before adding the links:

Each handler call has it’s own span, initialized with telemetry.Passthrough()
Only shows if there’s an error by default (by span.SetAttributes()) ✅
Can see all of them with debug=1 in cloud (great to debug)
If stdin shows the previous handler failed, the span is set to cancelled ✅

#

After adding links, I can see the parent > child relationship but the error status and canceled attribute are bubbling up to the parent, i.e., up the link.

#

Could be easier to explain on a call if you can.

tribal crypt May 15, 2025, 3:28 PM

#

Yeah links are overloaded atm to mean cause / effect, like a delayed/lazy/async continuation of the original span. Sequencing is just represented as flat sequential spans

We should probably make that particular interpretation of links opt in, but I think in this case just starting them as normal and canceling the later ones would be enough? Like it doesn't seem like you'd need any fancy nesting or links, it boils down to the same UI challenges as regular telemetry since the chained withExecs all appear and then the later ones stay pending (just in your case I guess you want them to be canceled)

#

Also I'm on PTO, back Tuesday (no worries for the ping)

tribal crypt May 15, 2025, 3:52 PM

#

(oh but maybe the issue is you're not able to control the order that the telemetry appears in?)

wooden rain May 15, 2025, 4:01 PM

#

tribal crypt Also I'm on PTO, back Tuesday (no worries for the ping)

Sorry about that 😅 I'll ping again when you come back. Enjoy the PTO! 🙂

wooden rain May 21, 2025, 5:15 PM

#

tribal crypt Yeah links are overloaded atm to mean cause / effect, like a delayed/lazy/async ...

Hey @tribal crypt 👋 Welcome back! 🙂

The problem, if I don't link, is for example this:

container | from alpine | with-exec echo hello | stdout &
container | from alpine | with-exec echo there | stdout &
.wait

Then you'll see something like:

container
container
from
from
with-exec
with-exec
stdout
stdout

I would rather have them flat but grouped instead of nested, like a normal API pipeline, but it's not an option.

Yeah links are overloaded atm to mean cause / effect, like a delayed/lazy/async continuation of the original span. Sequencing is just represented as flat sequential spans

These are in effect a continuation of the original span in another goroutine, with only stdin/stdout connecting them. So sounds like that's what I'd want here.

Not sure what you mean in the last sentence because when I use links they're represented as nested. That would be ok except what confused me was that changing a span to be cancelled or errored, it propagates up so the spans that link to it also get the same attribute.

tribal crypt May 21, 2025, 5:20 PM

#

That would be ok except what confused me was that changing a span to be cancelled or errored, it propagates up so the spans that link to it also get the same attribute.
yeah, that's kind of what i mean - when span A links to span B, to the UI that means "span B is completing the work outlined by span A, so if it fails, represent it there too" - which is not quite what you have here, yes it's continuing but it's more like a separate step in a pipeline, it's not so tightly coupled as e.g. Container.withExec(foo bar) -> exec foo bar

#

is it possible to create a span at the highest statement level in the script? like a span for each of these three:

container | from alpine | with-exec echo hello | stdout &
container | from alpine | with-exec echo there | stdout &
.wait

wooden rain May 21, 2025, 5:23 PM

#

I see. Not really. Are you thinking that span could be the grouping thing I'm missing?

tribal crypt May 21, 2025, 5:23 PM

#

yeah

#

seems like that would make script telemetry easier to grok overall

tribal crypt May 21, 2025, 5:27 PM

#

tribal crypt > That would be ok except what confused me was that changing a span to be cancel...

(also sorry had the arrows reversed here - span B (exec foo) links to span A (Container.withExec), not the other way around)

wooden rain May 21, 2025, 5:28 PM

#

I can see that container is the first command in a chain. I can also see that from alpine follows some other span, but at that point container is done. Is there something that can be done in from alpine to do this? Or maybe just always create a passthrough span on the first one and have the others be children. That means if it's just one command it would show as one span inside another. If it errors, both show, right?

#

What would be the combination of attributes such that an error in a command only shows the enclosing span if it's not the first in the chain?

tribal crypt May 21, 2025, 5:31 PM

#

I don't think there should be any nesting in foo | bar | baz since nesting always means the children were run by (within / during) the parent

#

the missing grouping is what's killing the UI imo. maybe the library needs a PR for instrumentation?

#

it already has call/exec handlers, maybe it needs a StmtHandler?

#

the handler could also check if it's a simple statement (.wait) and skip the extra grouping

wooden rain May 21, 2025, 5:37 PM

#

No, I don't mean nesting between the steps of the pipeline, I meant nesting between the grouping span and the steps, which would be sequential/flat.

For example:

container | from alpine | with-exec echo ok | stdout

On success I want none to show. But on error:

container | from alpine | with-exec false | stdout

The parent span could show as errored, and all the child spans shown, with with-exec marked as errored too. Maybe only show the other sibling spans on increased verbosity, same with the parent span. I.e., by default could show just the span that originally errored.

But if just a single one like:

.help not-exists

Then I don't want the parent span to show since there's no need for a group here.

Trying to figure out what attributes I need to set where to make what I just described.

wooden rain May 21, 2025, 5:41 PM

#

tribal crypt the handler could also check if it's a simple statement (`.wait`) and skip the e...

I rather always create the parent/grouping span but hide it than go through the parsing statements route to conditionally create it. Don't want to overcomplicate since this is just for telemetry. But it helps to have proper spans that capture the right logs and output rather than having it all on the root span. Makes it easier to debug/troubleshoot. To be clear, having a span on handlers is new. Currently, you either get one span for a whole script if not interactive, or a span per prompt in interactive.

tribal crypt May 21, 2025, 5:46 PM

#

Well, I'd imagine the statements are already parsed by then so you'd just be looking for the handful of statement types that need grouping (or handful of ones that don't, whichever is simpler). Sounds less complicated to me than something that has to be dynamically interpreted based on states in the UI. Like if foo | bar | baz succeeds I personally wouldn't expect that to be dramatically different from when it fails - the group's span would just be successful instead of red, just like shell mode

#

I don't think what you're describing is currently possible with attributes

#

there are too many things interacting imo

#

but maybe I'm misunderstanding

#

here's all of em if you wanna throw the kitchen sink 😂 https://github.com/dagger/dagger/blob/main/sdk/go/telemetry/attrs.go

#

encapsulate/encapsulated are kind of close since they relate to failure states

wooden rain May 21, 2025, 5:53 PM

#

It's not easier with parsing, just because of the extra code needed to properly navigate the parsed document, and that's done outside/before the runner that calls the handlers. It's much easier to set attributes dynamically in the handlers.

Here's what I'm thinking:

If command has no stdin, create a grouping span
Create the span for the handler as a child of the grouping span
Before saving the state to stdout, save the grouping span's SpanContext in the state
If a handler's stdin is not nil it means it's not the first in a chain, so we should have access to the grouping span's SpanContext through the input state. Create the span for the handler as a child of that grouping span.
If the handler errors, since I have access to the grouping span, I can also set attributes on it if I need to.

Sounds doable, no?

#

Hmmm... problem is when to end the grouping span. It would have to be in the last span. That's when the state is actually resolved, which happens outside of the handler's code. Could be tricky to ensure it always runs, need to experiment with that.

tribal crypt May 21, 2025, 6:09 PM

#

Just to make sure we're on the same page, here's what I'm thinking, which still seems much simpler to me, but would need changes upstream:

func (h *shellCallHandler) Stmt(
    ctx context.Context,
    stmt *syntax.Stmt,
    run func(context.Context) error,
) error {
    switch stmt.Cmd.(type) {
    case *syntax.CallExpr: // add other "simple" ones
        // simple call; no need to group it
        return run(ctx)
    default:
        // more complicated than a single call; wrap in a span named after the expr
        p := syntax.NewPrinter()
        buf := new(strings.Builder)
        p.Print(p, stmt)
        ctx, span := Tracer().Start(ctx, buf.String())
        err := run(ctx)
        telemetry.End(span, func() error { return err })
        return err
    }
}

func (h *shellCallHandler) Initialize(ctx context.Context) error {
    r, err := interp.New(
                // ...
        interp.StmtHandler(h.Stmt),

#

the most controversial part there might be the error return value since the shell package does exit codes instead, but that's a pretty small tweak, just wrote what seemed like more idiomatic Go

wooden rain May 21, 2025, 6:31 PM

#

But requiring the upstream change is it's own blocking beast and this was just a side quest when resolving something else. So not sure if it's worth the trouble. Could be a later improvement/refactor. Depends on what @high parcel has to say, whether it's viable or not to have something like that. Or what a better solution would be.

high parcel May 21, 2025, 6:50 PM

#

Sorry can you summarize what change upstream you are suggesting? Or book a call with me at https://cal.com/mvdan

tribal crypt May 21, 2025, 7:09 PM

#

high parcel Sorry can you summarize what change upstream you are suggesting? Or book a call ...

essentially something like this code mockup - #1372517942528577556 message

basically some way to instrument statements before/after they run, so that a script's evaluation can be represented with something like OpenTelemetry, which in our case is what drives all of our UIs (the TUI, Cloud web UI, etc)

high parcel May 21, 2025, 7:10 PM

#

I see. Just for statements, or for any node? I will need to look through the code a bit.

tribal crypt May 21, 2025, 7:18 PM

#

high parcel I see. Just for statements, or for any node? I will need to look through the cod...

hmm, good question - I reached for statements, but low conviction. The goal is to be able to group things like foo | bar | baz & - but if that recurses into foo, bar, and baz as well it's probably fine for the instrumentation handler to be called for those too, as long as there's a way for the instrumentation to decide when it's "too noisy" (similar to what the code example above does by avoiding wrapping CallExprs)

high parcel May 23, 2025, 1:51 PM

#

I had a look. I think this could be a middleware func for interpreting any node, akin to a more generic ExecHandler. An API like

type NodeHandlerFunc func(ctx context.Context, node syntax.Node) error
func NodeHandlers(middlewares ...func(next NodeHandlerFunc) NodeHandlerFunc) RunnerOption

Then your middleware code could be like

if node is statement { trace begin }
err := next(ctx, node}
trace end
return err

#

This is highly configurable as it can work on any node. And you can even choose to e.g. not let the interpreter do its thing for a node by skipping the call to next.

tribal crypt May 23, 2025, 3:28 PM

#

high parcel I had a look. I think this could be a middleware func for interpreting any node,...

looks perfect! i think! 😄

high parcel May 23, 2025, 3:36 PM

#

It will take me a few hours this weekend to do this I think

wooden rain May 23, 2025, 3:37 PM

#

Yeah @high parcel, that could work 😃 So by not calling next, would it prevent the handler exec func from running, or just the node handler func?

high parcel May 23, 2025, 3:38 PM

#

This might be restricted a bit once I try to implement it. For example, we just return an error here, which works for statements and commands. But for some nodes like words which expand to strings, that won't work.

high parcel May 23, 2025, 3:39 PM

#

wooden rain Yeah <@680028836250910855>, that could work 😃 So by not calling next, would it ...

I'm not sure. I think day one I would not support "skip the default intetpreter logic" as that could introduce too many edge cases.

wooden rain May 23, 2025, 3:39 PM

#

Yeah, I have no need for that, so not requesting it 🙂

high parcel May 23, 2025, 3:40 PM

#

Out of curiosity, why not trace via ExecHandler?

#

I guess to trace full statements like pipelines, redirects, etc?

wooden rain May 23, 2025, 3:42 PM

#

high parcel Out of curiosity, why not trace via ExecHandler?

Think about a pipeline like:

cmd1 | cmd2 | cmd3

Each of those is its own exec handler, connected only by stdin/stdout. What we're looking for is a simple way to know to create a new context that groups those handlers. So the node handler func would need to create a new context but then the exec handler for each of cmd1, cmd2, and cmd3, would need to get that same context (or at least "inherited").

#

Context passed to interp.Run -> node handler func could add to context -> Interpreter adds exec handler context -> exec handler

high parcel May 23, 2025, 3:48 PM

#

Gotcha. So you're really just after the statements.

An idea: why not walk the syntax tree before you run the interpreter, to collect that context upfront (e.g. map from CallExpr to Stmt), then run the interpreter and use it?

This would need the CallHandler to receive the *CallExpr it came from, but we can add that.

#

I suggest this because more handlers/middleware makes the api and code more complex, as I need to worry about the combined use

#

One last idea: a way to ask the runner what is the stack (chain) of parent nodes being interpreted.

wooden rain May 23, 2025, 3:50 PM

#

That's what I was thinking could be the "statement" based solution today, but it adds a bit of complication. How would I connect a context created through the syntax parsing, which runs before interp.Run, and use it in an exec handler?

high parcel May 23, 2025, 3:51 PM

#

So e.g. File -> Stmt -> BinaryCmd (pipeline) -> Stmt -> CallExpr (where you would be in a CallHandler).

You could walk up the chain and find the parent statements this way.

#

(This is about the last idea of mine, asking the runner for the stack of parent nodes)

#

I think this avoids the issue you bring up about doing an upfront analysis step

wooden rain May 23, 2025, 3:55 PM

#

Oh, you mean if CallHandler had the CallExpr it came from. So for cmd2 in cmd1 | cmd2 | cmd3 it would get the syntax node for cmd2, but from there I can go up and see that it's a part of that "pipeline"?

#

But that's similar to what I'm doing with the exec handler. What Alex was suggesting was a way to wrap a begin; defer end; run when cmd1 | cmd2 | cmd3 happens, but not when it's just one command.

#

Only think that's important is that the exec handler has the right context.Context, and that we properly start an event on the beginning (with cmd1), and end it on the end (with cmd3).

high parcel May 23, 2025, 4:03 PM

#

I mean if CallHandler knew what CallExpr it came from, as well as all the parent nodes up to the root File

#

With this you could know if you are at the start or end of a pipeline, and run code before/after

wooden rain May 23, 2025, 4:10 PM

#

high parcel With this you could know if you are at the start or end of a pipeline, and run c...

I was typing exactly that 🙂 It's similar to what I was going to do, the difference is I'd know if I want to create the grouping span or not. Otherwise I'd always need to create it and just hide it. That would already be an improvement, but not as good as what Alex suggested, which would make the code more robust and easier to understand. Totally understand your concerns about adding complexity to your library, just trying to figure out what's possible or not 🙂

high parcel May 23, 2025, 4:20 PM

#

With a node visitor like #1372517942528577556 message, it might be hard to know if you're in the outer statement (pipeline) or inside one of the three inner statements

wooden rain May 23, 2025, 4:24 PM

#

If possible, it's better if the "parent" can add to the context and "wrap" the execution of the children, passing that context, instead of the child having to know about the parent and coordinate with siblings.

wooden rain May 23, 2025, 4:39 PM

#

This would be useful for us whenever you'd intuitively think about multiple exec handlers to be a part of the same "group" (as a user). BinaryCmd being the best example, although that doesn't include the whole chain, it's just left and right sides of one pipe.

#Telemetry with links