#Dagger stops using cache

1 messages · Page 1 of 1 (latest)

sleek brook
#

I've raised this before, but saw it again over the weekend. At some point, Image pulls stop being picked up by the cache and are "always" pulled. The solution is to nuke dagger-engine and the volume tied to it. Then things go back to normal.

iirc, this might be due to garbage collection? Would be good to get a warning if volume is like 80% full (configurable) in case this happens, let the user know of known potential issues and various solutions (via a link to learn more)

snow narwhal
#

seems like this is the case @sleek brook. When this happens again can you check in your engine to see if you spot any GC messages? it should say something there.

#

GC is an area that we've just started to focus on improving OX, and so far it's basically buildkit's default behavior plus log messages about it. We're planning to tackle overall caching visibility alogn with some other feedback from users starting here: https://github.com/dagger/dagger/issues/5601

GitHub

What are you trying to do? Part of the story around Dagger's appeal is its cache-first approach, backed by Buildkit. Dagger Cloud, for those who have alpha access, offers some visibility featur...

sleek brook
#

Is there a way to mark things as no-gc or a way to set TTL on a cache entry? (some way to set priority or importanct)

snow narwhal
#

the buildkit config has quite some knobs you can tune for GC. Haven't played with those myself

still turret
#

I experienced similar issues. I have a bigger pipeline (cold run without cache about 45mins) If I rerun the pipeline on my local machine, it takes about 2mins. But if i rerun the same pipeline/commit on my buildkit Kubernetes instace, it takes at least 7mins because some layers are always pulled....
By the way @proper galleon what about the dagger-graph tool? Is this tool still working to identify busted/ repulled caches?

proper galleon
still turret
#

But I still often experience some flaky cache invalidating issues on my gitlab>socket>buildkit setup

snow narwhal
#

@still turret can you check if you can spot anything in the engine logs about the GC kicking in?

still turret
#

@snow narwhal yes, as the build starts the gc is kicking in hard

time="2023-08-29T11:32:31Z" level=debug msg="removed snapshot" key=buildkit/6542/iiw5u1voilbn9ihuk3folqbpj snapshotter=overlayfs
time="2023-08-29T11:33:26Z" level=debug msg="engine metrics" cpu-count=6 cpu-idle=2987097575 cpu-iowait=1598904 cpu-irq=0 cpu-nice=478 cpu-softirq=11557284 cpu-steal=3349834 cpu-system=116372077 cpu-total=3261099480 cpu-user=141123328 dagger-server-count=0 disk-available-/=35066859520 disk-available-/var/lib/dagger=35066859520 disk-free-/=35066859520 disk-free-/var/lib/dagger=35066859520 disk-size-/=128283815936 disk-size-/var/lib/dagger=128283815936 goroutine-count=21 loadavg-1=2.37 loadavg-15=2.71 loadavg-5=3.22 mem-active=3627749376 mem-available=13507457024 mem-buffers=39436288 mem-cached=7233597440 mem-committed=21699035136 mem-free=5050580992 mem-inactive=10925416448 mem-mapped=1425022976 mem-page-tables=65044480 mem-shmem=83173376 mem-slab=3827691520 mem-swap-cached=0 mem-swap-free=0 mem-swap-total=0 mem-total=25252737024 mem-vmalloc-used=0 proc-self-mem-anonymous=16379904 proc-self-mem-private-clean=42463232 proc-self-mem-private-dirty=16379904 proc-self-mem-pss=58843136 proc-self-mem-referenced=58...
time="2023-08-29T11:33:27Z" level=debug msg="snapshot garbage collected" d=56.455054067s snapshotter=overlayfs
time="2023-08-29T11:33:27Z" level=debug msg="gc cleaned up 9223105639 bytes"
time="2023-08-29T11:34:26Z" level=debug msg="engine metrics" cpu-count=6 cpu-i
snow narwhal
still turret
#

No I'm running the engine with the default settings as far as I know

snow narwhal