#verbose tui trace confusion

1 messages · Page 1 of 1 (latest)

snow lake
#

@tardy ridge am i just confused here? i feel like i'm just confused, but regardless of this is expected or unexpected tui behavior i'd like to understand what's going on

tardy ridge
#

do you have a cloud link? won't have metrics but some parts will be easier to grok than screenshots

snow lake
#

lemme repro again

#
#

(focusing on the corepack use call here, but the same thing happens in other contexts)

#

(some part of me is hoping that i'm not just confused here and there's some un-instrumented timecost to the 0-duration spans, like maybe copies or something)

tardy ridge
#

so, one thing that happens is that in a call like this, the File argument is passed by ID, and that ID gets loaded by just re-evaluating it and taking the resulting object - so that's why you see its calls repeated beneath the withFile call. those should be dagql-level in-memory cache hits

#

i think at one point I wanted to just not show spans at all for dagql cache hits, but that led to weirdness with parallelism (like call chains being split arbitrarily)

snow lake
#

so 0-duration spans are re-eval-dagql-cache-hits, but CACHED spans are exclusively bk vertex cache hits?

i also found this comment which implies that we'd like the re-eval-dagql-cache-hits to read as CACHED but they don't for some reason? is the comment referring to these 0-duration cache hits or different ones?

GitHub

An engine to run your pipelines in containers. Contribute to cwlbraa/dagger development by creating an account on GitHub.

GitHub

An engine to run your pipelines in containers. Contribute to cwlbraa/dagger development by creating an account on GitHub.

tardy ridge
#

CACHED spans are exclusively bk vertex cache hits?
correct

#

but, now that i look at the code, it seems like we actually shouldn't see spans at all for dagql cache hits

#

at the very least, cached is always false for the callback

#

oh my local repo checkout is out of date

#

one sec, rebasing

#

rebased, and it seems to have the same intention (telemetry is only emitted if we're NOT a dagql cache hit) - so I'm not sure why those spans are showing up multiple times

#

unless for some reason it's tainted

#

hmm would be cool if the UI highlighted tainted calls

snow lake
#

wouldn't there be logs and durations if they were tainted?

tardy ridge
#

not necessarily, the API calls themselves don't really do anything that takes time and don't print logs on their own, that would only happen if the LLB ops themselves re-ran

snow lake
#

ah, false assumption on my part that tainted implies re-run, then?

tardy ridge
#

tainted just means it won't be cached in memory by dagql, so the resolver will always run

#

which would explain why we see the spans again

snow lake
#

gotcha

tardy ridge
#

but, i'm not sure if that's what's happening

snow lake
#

thought experiment for myself, if the resolver wasn't re-running, what would we want the traces to look like? like in this screenshot, we corepack use for real right before yarn install, then save off the lockfile. the call right after does .withFile(lockfile) and continues onwards... if the resolver wasn't re-running up to corepack use, what container-producing span would that withFile parent to?

#

it kinda feels like the only way we have to visualize these dagql "continuances" is to either redraw the parent or do some weird nesting, although nesting feels like it'd read weird because it's not like the next-step-in-chain forked shit is down-stack from its parent

snow lake
#

@dim panther kinda curious your thoughts on this as well, especially coming off of work moving more things to dagql cache... with the -vvv and the entailed exec metrics printing, it's kinda weird how it'll report the utilization multiple times (subsequent ones having 0-duration) ... im kinda inclined to look for a way to either:
a. mark these as CACHED even though they're not bk-cached but dagql cached
b. "hide" these calls by having their down-chain referencers point back at the previous call (no idea how to visualize this though... maybe in the webgui we use those little "chips" like we use for directory and file params, but for the method-root object, but in the tui it'd have to be by ID? so like Container(CACHED: $IDXYZ).withExec(...) instead of reprinting the whole chain of already-exec'd calls that led to that point? am i missing some detail of purity re-execs that makes this weird?)

tardy ridge
#

i feel like i'm seeing more duplicate call spans lately notsureif - e.g. dagger -m github.com/vito/bass call integration --runtime Buildkit

tardy ridge
#

note the repeated apko.wolfi calls

#

seems to happen repeatedly too, maybeinterestingly this one is only twice

tardy ridge
dim panther
dim panther
# tardy ridge i feel like i'm seeing more duplicate call spans lately <:notsureif:102505866723...

I get an error loading the module when running that (go: conflicting replacements for go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc), so can't reach the point where you were seeing dupes.
I was just gonna look at it w/ -vvvvvvvvvv so it showed the digests of the calls. I'm presuming that if we are seeing duplicates it could plausibly be that we're sending the same telemetry but w/ different call digests?

tardy ridge
tardy ridge
snow lake
#

squint could that have perf implications?

#

4 months feels like it potentially lines up with some CI slowdown

dim panther
tardy ridge
#

yeah, span ID; I think that'll show up on max verbosity

#

considering those functions are marked impure now, it makes sense to me that you would see two spans if there are two things trying to evaluate it, since they won't be deduped by the caching logic

dim panther
tardy ridge
#

not really hyperthinkspin i actually don't know why it shows up so many times (4x by the looks)

#

my guess is they're actually inside multiple IDs being passed in which end up being evaluated in parallel somehow, maybe server side

#

oh, wait - does client-side resolve objects to IDs in parallel? maybe that's it

#

i kind of remember that somewhere in the go sdk

dim panther
#

Okay seems like I got a quick fix working locally. Can get rid of the dupes w/ the bass example while also still passing the Secret integ tests that cover the original reason that impurity was added 4 months ago.

I just replaced the Impure: true with a custom cache key func using the stuff in that PR I merged the other day. End result is function calls are cached in dagql per-client, which is granular enough to allow the secret corner cases to still be covered, while not as granular as Impure:true such that they appear multiple times.

Will send out a PR in a min

tardy ridge
#

sweet!

dim panther
#

https://github.com/dagger/dagger/pull/9483, some subtlety to this of course so hopefully CI catches anything I didn't think of. Also curious to see if there's any discernable performance impact amongst the noise since I agree @snow lake this probably is a performance improvement (just not sure if it's enough of a bottleneck to be noticeable)

snow lake