#Telemetry: cache miss reasons
1 messages ยท Page 1 of 1 (latest)
Trying to understand the reason for cache misses in graphviz. Is the output of CachedReason() always accurate, or only in certain cases?
"effects" are usually e.g. the LLB ops installed by withExec (the "cause") - they are correlated by cause/effect ID which for LLB ops is the vertex digest, but the string can be anything
nowadays span links are mostly used for this instead of cause/effect attrs but the code is still there for UI backwards compat. with older engines
So for a newer engine, you'd look into span links for cache miss reasons?
eh no effect IDs are probably better for that - since they say "here are all the possible effects you should see for me in the future"
span links only appear when the effect's span arrives - which might be a cache hit, might not, and it might never actually arrive (if it's a dependency of a cache hit for example, you won't see a span for all the transitive dependencies)
it should be pretty accurate, but it's complicated as hell to implement so there's always a chance I missed a fringe scenario
this comment is related to the last bit I mentioned: https://github.com/dagger/dagger/blob/ef3400788bf37c1c96cf41d1890b5d82f556c5f7/dagql/dagui/spans.go#L506-L511
Ok, that's good to know ๐ I can see you get a span from an effectID via span.db.EffectSpans[effect]. Can it match a span from a function call, or is it always to something lower level like an LLB op?
it's always something "lazy" like an LLB op. also we don't even show the effect spans, the UI just merges them with the originating/causal span instead, so e.g. withExec(sleep 30) shows a 30s duration
Would you be surprised if a "higher level" span (say in terms of the nodes in the graphviz output from the CLI) has cached parents, but not be itself cached?
Assuming all function args are pure
it's technically possible - sometimes the effects just never run. for example, a module whose constructor applies a default like container.From("golang:1.20") that never ends up being used, will make the constructor look like it's pending forever
So a pending span shows up as not cached in CachedReason()?
it should yeah - they're derived from the same data. (there's also a PendingReason())
btw - if you press '?' in the TUI, or add ?debug to the URL in the web UI, it'll surface all that info
Ooh, didn't know about ?debug!
a little buggy though, the debug icon shows up for nested spans since it's far off in the margin ๐
Ok, I can se the effectIDs in that debug, but how can I relate them to something?
it also lists the effect spans and their statuses below the <pre> - they don't show up normally so can't really relate it to other spans
I'm looking at a span in cloud ui where the snapshot shows a bunch of effectids, but in the Effects section only one is shown. Which ones go in there?
Some IDs are in xxh3 and others are in sha256. Does that tell you something about what kind of "span" it is?
noticed that too, I think it's just a side effect of @bitter ibex's recent refactor (https://github.com/dagger/dagger/pull/9204) - if it's sha256 that means it's a custom digest i guess?
part of the march towards dagger/dagql native content-addressing
hopefully that only shows up in the inner plumbing like this
hmm looks like it's used for the context dir too
basically anywhere that you used to see blob(...)
If a span HasLogs, it's not cached?
yeah, since that implies something ran
Oh, I see.
hmm it seems like the list of effects is incomplete
for example this span beneath codegen
Can't tell why it ran, though.
same digest is listed in the effect IDs of the parent codegen call, but the uv lock span isn't listed for some reason
yeah, this part is implicit with cache misses still, theoretically it comes down to a change in the inputs from last time
are you trying to determine why codegen re-ran?
Yeah, this withExec
i see
wonder if it has to do with this? https://github.com/dagger/dagger/issues/8955
cc @brisk walrus - saw it mentioned in another thread