#discuss cache volume impurity
1 messages · Page 1 of 1 (latest)
moving the conversation to thread to avoid noise in main channel
and Module FOO:
func (f *Foo) GetCacheVolumeId(ctx context.Context) (string, error) {
cacheVolume := dag.CacheVolume("volume-name")
_, err := dag.Container().
From("alpine:latest").
WithMountedCache("/foo", cacheVolume).
WithExec([]string{"sh", "-c", "echo -n 'hello foo' > /foo/bar.txt"}).
Sync(ctx)
if err != nil {
return "", err
}
id, err := cacheVolume.ID(ctx)
return string(id), err
}
func (f *Foo) UseCacheVolumeAcrossModuleUsingId(ctx context.Context, id string) (string, error) {
return dag.Bar().UseCacheVolumeID(ctx, id)
}
so essentially mod A calls mod B function and pass the cacheVolumeId
to reproduce the prob i am trying to explain:
on a fresh install
- if I call modB.UseCacheVolumeName("volume-name"), and try to list "bar.txt", it fails.
- if I call modA.GetCacheVolumeId() - it populates the cache
- if I call modB.UseCacheVolumeName("volume-name"), and try to list "bar.txt", it still fails.
- now if i call modA.UseCacheVolumeAcrossModuleUsingId("id") - it works. Notice this calls modB.UseCacheVolumeId()
- Now if I call modB.UseCacheVolumeName("volume-name"), and try to list "bar.txt", it works. I was expecting this to fail
this repo has reproducible module: https://github.com/rajatjindal/dagger-same-cache-volume-id/
yeah, i don't know - i don't have any idea how you've implemented anything so far, it could be anything 😛
how are you getting the id when you're calling "UseCacheVolumeAcrossModuleUsingId"
oh i see you're chaining them
sorry, i was cleaning up the branch to make it easy to observe relevant changes:
https://github.com/dagger/dagger/compare/main...rajatjindal:dagger:cache-volume-id-issue?expand=1
repro logs here: https://gist.github.com/rajatjindal/7b737644053726d2e2eb1eea302e5bda
okay, i have no idea, but something is clearly wrong
what i would do - add logging statements around every call to CacheVolume in the schema, and also in WithExec and print out which volume keys are actually being used
ohh
i actually know what this is
buildkit is caching the first and the second operations
if you change the mountpoint between the two it'll work as expected
but cache volume names aren't treated as cachebusters
hm, understanding that i'm trying to work out if i think this is unexpected:
- if cachevolumeid and cachevolumename were in different modules, i think we should probably prevent them from doing this caching behavior
- but given they're in the same module, i actually think this is maybe the right behaviour
if cachevolumeid and cachevolumename were in different modules, i think we should probably prevent them from doing this caching behavior
even if cachevolumeid function was called as a dependency and cachevolumename function was called directly.
thinking loud about this. lets say if we have a module that populate some sensitive data in cache-volume. then two different modules can import that as dependency and will can end up sharing the same cache volume.
let me see if we can cache the volume by id instead. that may work
also, i think we will need to consider the CacheSharingMode.
thanks for your inputs Justin. I'll check more
@lean seal when you say:
but cache volume names aren't treated as cachebusters
are you saying that from buildkit perspective or dagger's perspective?
from buildkit's
ah, that might explain this behavior then. (as you pointed out earlier)
am I correct to relate this log line in our logs:
time="2024-10-11T14:23:30Z" level=debug msg="load cache for [internal] exec sh -c ls /bar/bar.txt with sgx7cni7ddzhlyrvz0r7yoi50::qbg43chw3hwbvh6s29ziaeazq"
to this line in buildkit's code base:
https://github.com/moby/buildkit/blob/2534310fd4c59018ae3874e8d0fad493086e2575/solver/edge.go#L897
Fyi @lucid grove curious if you have strong opinions on this, not sure if there's an obvious right way that this would be done
@lean seal @lucid grove Rajat was hitting the scenario about buildkit not invalidating the layer cache even if the cache volume name changes across executions. Is that something that we have more control about now that we could / should potentially do something about? I understand the reason this happens since after all, the ** cache volume ** should be a "cache" and re-running the exact same operation on a different volume name shouldn't "theoretically" yield to a different result. Having said that, we know that sometimes users use and abuse cache volumes and I understand how they'd expect the example below to work.
Small repro of what I mean:
package main
import (
"context"
)
type Test struct{}
func (m *Test) Read(ctx context.Context, name string) (string, error) {
return dag.Container().
From("alpine:latest").
WithMountedCache("/bar", dag.CacheVolume(name)).
WithExec([]string{"sh", "-c", "ls /bar/bar.txt"}).
Stdout(ctx)
}
func (m *Test) Write(ctx context.Context, name string) (string, error) {
return dag.Container().
From("alpine:latest").
WithMountedCache("/bar", dag.CacheVolume(name)).
WithExec([]string{"sh", "-c", "touch /bar/bar.txt"}).
Stdout(ctx)
}
dagger call read --name foo> failsdagger call write --name foodagger call read --name foo> succedsdagger call read --name bar> also succeds when it should have failed
i think this is expected - the cache volume is just being used here to accelerate what's going on, changing the name isn't a cachebuster imo
Having said that, we know that sometimes users use and abuse cache volumes
I think this is the heart of the issue - users are feeling the need to do this. Cache volumes are very confusing, don't provide a lot of guarantees, and it's really really difficult to actually reason about what should be in one
I think we need to work out why/where people are mis-using cache volumes for things - and we should provide better APIs for that
e.g. I know one is around interacting with services after they've started - I can't get content out of the filesystem of the service after it's running. I think we should have an API for actually doing this.
I think if we can remove those cases, then this cache-volume confusion will matter less - I'd rather focus efforts on making cleaner APIs for users, instead of making it easier to hack around with one of the most confusing parts of the API
Thx for the inputs Justin 🥰
do you have a specific list of things where people are using cache volumes weirdly? i'm thinking it would be good to make an issue with these