We use dagger cloud caching heavily in CI. We persist our go test cache and go build cache. They currently grow indefinitely, a known challenge with reusing caches when persisting a new cache and an issue we faced before dagger when using GitHub actions caches for the same purpose. We managed this by deleting our GitHub actions caches entirely every two weeks, to prevent our VMs from running out of disk space. The consequence was that the next workflow to run would pay the penalty of not having those caches, and then the cycle continues. We hit disk space issues today, with 100GB disks on our runners. Iโm not sure how to handle this other than to disable the existing dagger caches and/or renaming the cache keys. Open to any thoughts on the matter.
#Dagger Cloud Cache too large
1 messages ยท Page 1 of 1 (latest)
Could I run a routine dagger function that wipes the cache path? I know we can't export from cache.
Just tested that, definitely can't rm -rf a mounted cache path
cc @hardy ore @zenith meadow
Hey @nocturne forge. At the moment we don't have an easy and accesible way for users to wipe the cache. But we can do that for you upon request
๐ you can't remove the path itself, but you should be able to remove all the contents inside of it, so if you do rm -rf /cache_mount/* that should wipe everything in the cache mount and it will be "remotely wiped" once the engine syncronizes it to Dagger Cloud
Unfortunately i'm not finding any success with this approach. Dagger engine logs show skipped pushing cache mount <blah> and the very surprising syncing cache mount remotely <blah> for caches i know that are disabled in the dagger cloud UI.
We absolutely need the ability to blow away the caches. We're now seeing 3 minutes added to every single GHA run due to downloading every cache. Up from 90s-120s this morning. I will need to disable dagger cloud connectivity on our runners first thing tomorrow morning if we're not able to get rid of the caches.
If possible, as soon as possible, could we please have our org's caches entirely deleted? Like, nothing listed in the UI? I'm also going to revisit the liberal use of
dagger.ContainerWithMountedCacheOpts{
Sharing: dagger.Shared,
},
as well as trying out go clean permutations instead of rm -rf /cache_mount/*
Here is a trace for the cleanup caches dagger function I'm running that the engine skips pushing: https://dagger.cloud/Tallied-Technologies-Inc/traces/a5d963aae6619940580b8424ae3c6ea7
@nocturne forge I'll wipe your caches now and work on something so you perform that yourself
done! all the cache volumes have been wiped @nocturne forge
Thank you! Much appreciated
Iโm seeing a big reduction in cache pull on startup. Looks like that did the trick for that. But now weโre seeing 90s initialize module durations. https://dagger.cloud/Tallied-Technologies-Inc/traces/49ff82984e8e84fb2f17553d33782c9c
@nocturne forge can you please validate if this happens in subsequent builds? I just want to check if this is mainly a consequence of starting from a cold cache
๐
It happened a few more times, but I'm not seeing it every time, thankfully. I don't know why it would take over a minute to
.withoutFile(
path:
"cicd/dagger.gen.go"
)
yes, I saw the same and we're discussing what could be causing this. Trying to replicate ATM
btw, regarding removing the modules yourself, we've tried with the rm -rf approach oursleves and it worked for us
Dagger engine logs show skipped pushing cache mount <blah> and the very surprising syncing cache mount remotely <blah> for caches i know that are disabled in the dagger cloud UI
I know what's happening here also. Will have a fix for that next week as well ๐
package main
import (
"dagger/rmcache/internal/dagger"
"time"
)
type Rmcache struct{}
func (m *Rmcache) Call() *dagger.Container {
return dag.Container().
From("alpine:latest").
WithMountedCache("/cache", dag.CacheVolume("testing-rm")).
WithExec([]string{"sh", "-c", "head -c 50m /dev/urandom > /cache/file.txt"})
}
func (m *Rmcache) Rm() *dagger.Container {
return dag.Container().
From("alpine:latest").
WithMountedCache("/cache", dag.CacheVolume("testing-rm")).
WithExec([]string{"sh", "-c", "rm -rf /cache/*"})
}
func (m *Rmcache) Check() *dagger.Container {
return dag.Container().
From("alpine:latest").
WithEnvVariable("CACHE_BUST", time.Now().String()).
WithMountedCache("/cache", dag.CacheVolume("testing-rm")).
WithExec([]string{"sh", "-c", "ls -lh /cache/"})
}
This is the example we used to repro the cache mount removal!
Thanks! My clear cache mount module looks the same; it's just that the dagger engine isn't pushing it back up.
@hardy ore Good morning/afternoon! Is the expectation that I should be able to clear caches with the above strategy? I'm doing what matipan is doing in the example above, but Im still seeing 2-3 minute cache pulls ๐ฆ
hey! GM! yes, the above strategy works. We tested it ourselves. Let me check the size of your cache mounts here and report back via DM
@nocturne forge seems like you're running your CI in multiple AWS regions? I can see us-east-1 and us-west-2 here. Which is the one that you're trying to remove the volumes from?
seems like us-west-2 is the one that has the cache_volumes. Can you confirm please?
Hmm, our CI runs exclusively in AWS us-west-2. Our developer machines also use dagger cloud caches and we have folks in California, Oregon, Montana, Colorado, Nebraska, Toronto, and Poland.
Here is the trace of the most recent cache clean I ran: https://dagger.cloud/Tallied-Technologies-Inc/traces/730d6dacafb27b2bbdf05b6da2510a78
It mounts every 'enabled' cache we have by name, in sharing mode, and does rm -rf {path}/* . Each Engine (CI or dev machine) is given up to 6 minutes to gracefully shut down.
As we briefly touched before, your devs and CI cache is not currently shared, as they run in different environments. So if you remove a cache volume from your local machine, your CI won't pick that change
I can check the cache size for your account, what I need to know which one I should look at. Your devs or CI environment ๐
Yep, understood. The trace above was ran from CI. When I ran it locally and restarted my local engine, cache pull was much faster.
The CI environment is the current priority, aws us-west-2.
awesome
let me check your CI environment really quick