#cache mount id

1 messages · Page 1 of 1 (latest)

frosty frigate
#

@bleak wharf

#

I think we should hide the IDs (working theory)

#
  • We control buildkit so hypothetical conflict with non-dagger client (already low) is even lower now
#
  • We can give mounts their own ID just like containers and directories
#
  • it’s not weird to reference a mount by id, same pattern as for everything else (was not so in Cue)
bleak wharf
#

hm interesting, so would we expose a MountID and Mount type?

#

fwiw I have an initial implementation without that, withMountedCache just grabs the parent container ID + mount target path to come up with a cache ID

#

will push soon

void burrow
frosty frigate
#

oooh right it’s that kind of ID…

void burrow
#

I think that's the problem, there's not a good "content-addressable" way of creating IDs for these (that I can immediately think of)

frosty frigate
#

sorry I totally missed that part

void burrow
#

No I mean there may be some way of doing it, it's just a lot trickier

#

Cache mounts in general go against the whole philosophy of content-addressable LLB, they introduce tons of weirdness like this

frosty frigate
#

observation: if we don’t support sharing cache directories between containers , everything gets easier

#

then we can use some internal magic to infer cache id without exposing it to devs

#

for example it could be the graphql “path” of the mount

bleak wharf
#

so container { bustsCache { withMountedCache { usesCache } } }

bleak wharf
frosty frigate
bleak wharf
#

the ID is calculated at withMountedCache time, not LLB generation time, so it's just a string by that point

frosty frigate
#

oh I see you’re not computing the ID from the actual llb

bleak wharf
frosty frigate
#

Am I correct in thinking that the only way to get no weird cache misses at all, and no weird implicit API to learn, would be to require an individual API call to create each mount directory, then use their ID in a container call?

#

So that would be the verbose, but most correct design

bleak wharf
#

would that be creating unique mount IDs?

void burrow
#

Here's an example of where we are currently using cache mounts for npm dependencies (saves a ton of time): https://github.com/sipsma/dagger/blob/57685be3dd4eb5f95ef0e38d46ce4b35a1d03cf8/examples/yarn/index.ts#L65-L65

If we change the cache ID to be the container ID, that will includes the LLB of the rootfs and any other mounts in the exec right? And so wouldn't that mean that every time the input mount changed, a new empty cache mount would be used?

I'm just not sure if cache mounts are very useful anymore if that's true.

frosty frigate
frosty frigate
bleak wharf
frosty frigate
void burrow
frosty frigate
#

from experience with europa. A little verbosity on top of a robust primitive is WAY easier

#

undebuggable cache mount issues were s good 50% of all developer support cycles I’d say

#

aggravated by cue dx to be sure, but it’s not all cue’s fault

bleak wharf
#

what's the difference between having a mount creation API (utlimately for ID generation) and just generating IDs client-side?

frosty frigate
#

Consistency of DX I’d say

#

Everything else is an object with server-side generated IDs

bleak wharf
#

right, but those are all content-addressed

frosty frigate
#

for now

bleak wharf
#

I'm not sure how the client will know when to generate a mount ID and when to re-use one (or even know how to get back to it)

frosty frigate
#

but secrets may not be

#

and services definitely aren’t

#

we could call them “volumes” instead to clear up the confusion. makes it more clear that it’s not content addressed, similar to services and secrets

void burrow
frosty frigate
#

ah right

#

volumeish

#

I think calling it a CacheDirectory would work

#

you want to reuse the same cache content, reuse the same cache directory

#

CacheVolume ?

void burrow
# bleak wharf what's the difference between having a mount creation API (utlimately for ID gen...

I guess one nice part is that it will make it easier to support different ways of generating an ID in the longer-term. So, we could hypothetically start with the same (confusing, bad) interface today where users just provide an ID. But then in the future we could add a different way of creating cache mounts where the ID is derived on behalf of the user and then deprecate the old way.

I think it's easier to do that with a separate API than it would be to continue with the current approach where it's all embedded in the Container api (I could be wrong though, maybe it's just easier to think about but implementation-wise it's arbitrary?)

void burrow
bleak wharf
void burrow
#

I think that's sort of like what GHA cache does iirc

bleak wharf
#

oh right that's probably a good comparison

#

GHA caches also has 'restore-keys' and special prefix semantics which is interesting

#

with a key like 'foo-bar-1234' you can configure restore-keys as [foo-, foo-bar-] and then when you're creating 'foo-bar-5678' it'll find 'foo-bar-1234' and restore it first, instead of starting from scratch

#

i have no idea how we would implement that, but i've found it useful in the past

void burrow
void burrow
void burrow
bleak wharf
#

your wish is my command

void burrow
frosty frigate
#

@fallow gust 👆

fallow gust
nocturne charm
#

e.g. caching on platform + runtime version + dependency file checksum

frosty frigate
#

The current plan is to hide raw builkdit cache mount IDs from direct developer control, because it's too easy to shoot yourself in the foot, and it's very different from the rest of the API

#

so if you need 4 "cache volumes" (current name for them) for 4 different platform/runtime combinations, then you'd make 4 calls to the Dagger API, to create 4 different cache volumes, and track their IDs in your code the usual way. Then use the IDs to mount the right volume in the right place

#

Slightly more tedious because you need to make 1 separate API call for each cache volume you want to create (as opposed to seamlessly configuring mount directly in your container creation call).

#

But way more reliable (we think)

nocturne charm
#

That makes sense, but how do you reuse that cache across engines?

frosty frigate
#

(full context in this thread)

frosty frigate
#

Or, do you mean how to persist eg. in between CI runs?

nocturne charm
#

Between CI runs

frosty frigate
#

Well that's a different question, you have the same problem regardless of the API to use cache mounts

nocturne charm
#

I guess it's not distinct from a regular cached layer at that point

frosty frigate
#

sorry I misunderstood that code you were showing, thought it was setting up Dagger cache mounts

#

Cache persistence in CI environments remains a PITA and is actually unchanged in the cloak design

#

We basically haven't touched that part

#

(I think)

#

The solution for CI remains: 1) slow the bleeding for now; 2) perhaps make it easier to handoff run to persistent worker machines, a-la bass loop; 3) one day run a magical caching service that makes all the pain go away

nocturne charm
#

yeah I think it's mostly not important to persist the cache volume across CI runs since you'd have the whole layer cached. It would be potentially useful for different pipelines that could reuse the same cache but that's way more complicated

frosty frigate
#

if your cache volumes get wiped in between runs, your runs will be slower though

nocturne charm
frosty frigate
#

yes that will be possible, although with a slightly redesigned API per my earlier comments. In a CI environment if you don’t arrange for cache persistence, your yarn cache will always be empty

nocturne charm
#

Right with the new api, cool. Yeah the reason it came up was because we were talking about putting together extensions for the popular language package managers to handle all the caching scary bits for the users. So part of that will have to be help on setting up the underlying CI cache

frosty frigate
#

Got it. With dagger those 2 are decoupled:

  1. Dagger extensions to handle language-specific caching (CI-agnostic)

  2. CI-specific configuration to persist Dagger cache (language agnostic)