Telemetry: cache miss reasons | Dagger | Page 1

dreamy cloak Jan 28, 2025, 3:46 PM

#

Hey @left void! 👋 What are span "effects"?
https://github.com/dagger/dagger/blob/ef3400788bf37c1c96cf41d1890b5d82f556c5f7/dagql/dagui/spans.go#L498

#

Trying to understand the reason for cache misses in graphviz. Is the output of CachedReason() always accurate, or only in certain cases?

left void Jan 28, 2025, 4:09 PM

#

dreamy cloak Hey <@108011715077091328>! 👋 What are span "effects"? https://github.com/dagger...

"effects" are usually e.g. the LLB ops installed by withExec (the "cause") - they are correlated by cause/effect ID which for LLB ops is the vertex digest, but the string can be anything

#

nowadays span links are mostly used for this instead of cause/effect attrs but the code is still there for UI backwards compat. with older engines

dreamy cloak Jan 28, 2025, 4:11 PM

#

So for a newer engine, you'd look into span links for cache miss reasons?

left void Jan 28, 2025, 4:13 PM

#

eh no effect IDs are probably better for that - since they say "here are all the possible effects you should see for me in the future"

#

span links only appear when the effect's span arrives - which might be a cache hit, might not, and it might never actually arrive (if it's a dependency of a cache hit for example, you won't see a span for all the transitive dependencies)

left void Jan 28, 2025, 4:14 PM

#

dreamy cloak Trying to understand the reason for cache misses in graphviz. Is the output of `...

it should be pretty accurate, but it's complicated as hell to implement so there's always a chance I missed a fringe scenario

left void Jan 28, 2025, 4:15 PM

#

left void span links only appear when the effect's span arrives - which might be a cache h...

this comment is related to the last bit I mentioned: https://github.com/dagger/dagger/blob/ef3400788bf37c1c96cf41d1890b5d82f556c5f7/dagql/dagui/spans.go#L506-L511

dreamy cloak Jan 28, 2025, 4:18 PM

#

left void it should be pretty accurate, but it's complicated as hell to implement so there...

Ok, that's good to know 🙂 I can see you get a span from an effectID via span.db.EffectSpans[effect]. Can it match a span from a function call, or is it always to something lower level like an LLB op?

left void Jan 28, 2025, 4:23 PM

#

it's always something "lazy" like an LLB op. also we don't even show the effect spans, the UI just merges them with the originating/causal span instead, so e.g. withExec(sleep 30) shows a 30s duration

dreamy cloak Jan 28, 2025, 4:27 PM

#

Would you be surprised if a "higher level" span (say in terms of the nodes in the graphviz output from the CLI) has cached parents, but not be itself cached?

#

Assuming all function args are pure

left void Jan 28, 2025, 4:31 PM

#

it's technically possible - sometimes the effects just never run. for example, a module whose constructor applies a default like container.From("golang:1.20") that never ends up being used, will make the constructor look like it's pending forever

dreamy cloak Jan 28, 2025, 4:38 PM

#

So a pending span shows up as not cached in CachedReason()?

left void Jan 28, 2025, 4:55 PM

#

dreamy cloak So a pending span shows up as not cached in `CachedReason()`?

it should yeah - they're derived from the same data. (there's also a PendingReason())

#

btw - if you press '?' in the TUI, or add ?debug to the URL in the web UI, it'll surface all that info

#

dreamy cloak Jan 28, 2025, 4:56 PM

#

Ooh, didn't know about ?debug!

left void Jan 28, 2025, 4:56 PM

#

a little buggy though, the debug icon shows up for nested spans since it's far off in the margin 😛

dreamy cloak Jan 28, 2025, 5:00 PM

#

Ok, I can se the effectIDs in that debug, but how can I relate them to something?

left void Jan 28, 2025, 5:01 PM

#

it also lists the effect spans and their statuses below the <pre> - they don't show up normally so can't really relate it to other spans

dreamy cloak Jan 28, 2025, 5:02 PM

#

I'm looking at a span in cloud ui where the snapshot shows a bunch of effectids, but in the Effects section only one is shown. Which ones go in there?

left void Jan 28, 2025, 5:03 PM

#

only ones that we've seen a span for

#

got a link?

dreamy cloak Jan 28, 2025, 5:05 PM

#

https://v3.dagger.cloud/dagger/traces/11f607251522f72f8a8096cb2079fd83?debug=&listen=c8cc4cc7967be237&listen=24de9b8da7b5a782&listen=7ffa6a19f105fba4&listen=794cce0398eebcd2&listen=de3b728a571ad200&listen=99a1bd740569e2c4

Dagger Cloud

Browse and visualize Dagger traces.

#

#

Some IDs are in xxh3 and others are in sha256. Does that tell you something about what kind of "span" it is?

left void Jan 28, 2025, 5:09 PM

#

noticed that too, I think it's just a side effect of @bitter ibex's recent refactor (https://github.com/dagger/dagger/pull/9204) - if it's sha256 that means it's a custom digest i guess?

#

part of the march towards dagger/dagql native content-addressing

#

hopefully that only shows up in the inner plumbing like this

#

hmm looks like it's used for the context dir too

#

basically anywhere that you used to see blob(...)

dreamy cloak Jan 28, 2025, 5:14 PM

#

If a span HasLogs, it's not cached?

left void Jan 28, 2025, 5:14 PM

#

yeah, since that implies something ran

dreamy cloak Jan 28, 2025, 5:14 PM

#

Oh, I see.

left void Jan 28, 2025, 5:15 PM

#

hmm it seems like the list of effects is incomplete

#

for example this span beneath codegen

dreamy cloak Jan 28, 2025, 5:15 PM

#

Can't tell why it ran, though.

left void Jan 28, 2025, 5:15 PM

#

same digest is listed in the effect IDs of the parent codegen call, but the uv lock span isn't listed for some reason

left void Jan 28, 2025, 5:16 PM

#

dreamy cloak Can't tell why it ran, though.

yeah, this part is implicit with cache misses still, theoretically it comes down to a change in the inputs from last time

#

are you trying to determine why codegen re-ran?

dreamy cloak Jan 28, 2025, 5:16 PM

#

Yeah, this withExec

left void Jan 28, 2025, 5:17 PM

#

i see

#

wonder if it has to do with this? https://github.com/dagger/dagger/issues/8955

#

cc @brisk walrus - saw it mentioned in another thread

#Telemetry: cache miss reasons