#a little bit - it means I can't do

1 messages ยท Page 1 of 1 (latest)

ruby frigate
#

Yeah happy to wire it up to honeycomb if you want! I don't need anything debugged on my end so it would be strictly for your research purposes

stone shore
#

thanks!

ruby frigate
#

Actually I'm about to PR this code so maybe that's easier?

stone shore
#

actually tbh I can probably just poke through the data in clickhouse

#

so feel free to punt

#

that way i'm not reliant on LLM making the exact same choices ๐Ÿ˜›

#

i just need to take a closer look at the span attributes

#

oh wait. i did a whole debug UI. good job past alex

they do indeed have different effect IDs ๐Ÿ˜ญ

https://v3.dagger.cloud/kpenfound/traces/3dabde63ed3a0b6c3cfe80b69881f022?listen=6a1fad3e2b3fec60&listen=6b3bdb9b5dbe28f6&listen=258644302e896d58&listen=811cf5e7e48dbeb5&showHidden=f73a7d9cc2efd45c&showHidden=1a880d1ed0669f4a&span=dff09196d48b1955&debug

  "EffectIDs": [
    "sha256:3095f90aaeacb47f9f4be5b1b2292f39fd2cc142e499bf02bc40319b4d1ff8e1"
  ],

vs.

  "EffectIDs": [
    "sha256:eb2fd9ae179c7f8e8971dcee6f734713f4e6e91a83b891bafca0df962f30d5d8"
  ],
#

cc @subtle pelican @rare hornet not urgent, but while we're ship-of-theseusing out of Buildkit - would it be possible to expose the actual cache key that Buildkit uses for deduping?

Here's the scenario:

  • Container A ran against directory.withNewFile("a", "foo")
  • Container B ran against directory.withNewFile("a", "foo").withNewFile("a", "foo") (same content)
  • Container A ran and failed, logs showed up at the withExec like normal
  • Container B "ran" but its withExec just shows pending and has no logs, yet its stdout returned the error immediately

So: different call ID (unavoidable really), but also a different LLB vertex digest (sad panda), and Buildkit's solver must have deduped based on the actual cache key. We never see a span for Container B's effect ID (aka vertex digest).

ruby frigate
subtle pelican
# stone shore cc <@949034677610643507> <@488718750690967563> not urgent, but while we're ship-...

Well I don't want to spend much (or ideally any) time/effort exposing more buildkit stuff vs. just replacing it with our own stuff. But if you want to see if there's some quick hack to expose that cache key you could just apply it to the dagger/buildkit fork. I think the cache key you want is the one used in this interface: https://github.com/sipsma/buildkit/blob/a1a1cc0e6c4538424d619c8c454186a2f9061e9f/solver/cachemanager.go#L64-L64

But that interface is extremely convoluted, so I couldn't say off the top of my head precisely where to hook in. There's some internal docs on parts of it here

In terms of the new Theseus world, this will all just be handled by call ID digests. Either an ID can have multiple digests (i.e. the recipe digest vs content-based one) or we "dag-ify everything" so that there's no "built-in" notion of laziness and what's modeled today w/ effect IDs just become different fields. Sort of hard to describe quickly, not sure if that makes sense ๐Ÿ˜…

#

Justin and I have two possible approaches in mind for how to deal with all the stuff today that is lazy LLB and uses effect IDs. We haven't reached the point where we've hit that crossroads yet, but making sure telemetry can deal with it in a headache-free way will obviously be an important factor

stone shore
subtle pelican
#

As an addendum, for this very particular scenario I actually think we'd be able to normalize pretty easily so that directory.withNewFile("a", "foo").withNewFile("a", "foo") just straight up returns the same ID as directory.withNewFile("a", "foo"). The withNewFile impl would just see "I'm the same operation as the previous one, nothing to do, just return parent dagql.Instance[core.Directory]

But those sort of normalizations are a case by case basis, can't possibly handle all of them like that, so the more general problem stands