#Dagger Cloud Cache too large

1 messages ยท Page 1 of 1 (latest)

nocturne forge
#

We use dagger cloud caching heavily in CI. We persist our go test cache and go build cache. They currently grow indefinitely, a known challenge with reusing caches when persisting a new cache and an issue we faced before dagger when using GitHub actions caches for the same purpose. We managed this by deleting our GitHub actions caches entirely every two weeks, to prevent our VMs from running out of disk space. The consequence was that the next workflow to run would pay the penalty of not having those caches, and then the cycle continues. We hit disk space issues today, with 100GB disks on our runners. Iโ€™m not sure how to handle this other than to disable the existing dagger caches and/or renaming the cache keys. Open to any thoughts on the matter.

nocturne forge
#

Could I run a routine dagger function that wipes the cache path? I know we can't export from cache.

nocturne forge
#

Just tested that, definitely can't rm -rf a mounted cache path

main elbow
#

cc @hardy ore @zenith meadow

zenith meadow
#

Hey @nocturne forge. At the moment we don't have an easy and accesible way for users to wipe the cache. But we can do that for you upon request

hardy ore
nocturne forge
#

Unfortunately i'm not finding any success with this approach. Dagger engine logs show skipped pushing cache mount <blah> and the very surprising syncing cache mount remotely <blah> for caches i know that are disabled in the dagger cloud UI.

#

We absolutely need the ability to blow away the caches. We're now seeing 3 minutes added to every single GHA run due to downloading every cache. Up from 90s-120s this morning. I will need to disable dagger cloud connectivity on our runners first thing tomorrow morning if we're not able to get rid of the caches.

#

If possible, as soon as possible, could we please have our org's caches entirely deleted? Like, nothing listed in the UI? I'm also going to revisit the liberal use of

    dagger.ContainerWithMountedCacheOpts{
                Sharing: dagger.Shared,
            },

as well as trying out go clean permutations instead of rm -rf /cache_mount/*

hardy ore
#

@nocturne forge I'll wipe your caches now and work on something so you perform that yourself

#

done! all the cache volumes have been wiped @nocturne forge

nocturne forge
#

Thank you! Much appreciated

nocturne forge
hardy ore
#

@nocturne forge can you please validate if this happens in subsequent builds? I just want to check if this is mainly a consequence of starting from a cold cache

#

๐Ÿ™

nocturne forge
#

It happened a few more times, but I'm not seeing it every time, thankfully. I don't know why it would take over a minute to

.withoutFile(
path:
"cicd/dagger.gen.go"
)
hardy ore
#

btw, regarding removing the modules yourself, we've tried with the rm -rf approach oursleves and it worked for us

#

Dagger engine logs show skipped pushing cache mount <blah> and the very surprising syncing cache mount remotely <blah> for caches i know that are disabled in the dagger cloud UI

I know what's happening here also. Will have a fix for that next week as well ๐Ÿ™

zenith meadow
#
package main

import (
    "dagger/rmcache/internal/dagger"
    "time"
)

type Rmcache struct{}

func (m *Rmcache) Call() *dagger.Container {
    return dag.Container().
        From("alpine:latest").
        WithMountedCache("/cache", dag.CacheVolume("testing-rm")).
        WithExec([]string{"sh", "-c", "head -c 50m /dev/urandom > /cache/file.txt"})
}

func (m *Rmcache) Rm() *dagger.Container {
    return dag.Container().
        From("alpine:latest").
        WithMountedCache("/cache", dag.CacheVolume("testing-rm")).
        WithExec([]string{"sh", "-c", "rm -rf /cache/*"})
}

func (m *Rmcache) Check() *dagger.Container {
    return dag.Container().
        From("alpine:latest").
        WithEnvVariable("CACHE_BUST", time.Now().String()).
        WithMountedCache("/cache", dag.CacheVolume("testing-rm")).
        WithExec([]string{"sh", "-c", "ls -lh /cache/"})
}

This is the example we used to repro the cache mount removal!

nocturne forge
nocturne forge
#

@hardy ore Good morning/afternoon! Is the expectation that I should be able to clear caches with the above strategy? I'm doing what matipan is doing in the example above, but Im still seeing 2-3 minute cache pulls ๐Ÿ˜ฆ

hardy ore
#

@nocturne forge seems like you're running your CI in multiple AWS regions? I can see us-east-1 and us-west-2 here. Which is the one that you're trying to remove the volumes from?

#

seems like us-west-2 is the one that has the cache_volumes. Can you confirm please?

nocturne forge
#

Hmm, our CI runs exclusively in AWS us-west-2. Our developer machines also use dagger cloud caches and we have folks in California, Oregon, Montana, Colorado, Nebraska, Toronto, and Poland.

#

It mounts every 'enabled' cache we have by name, in sharing mode, and does rm -rf {path}/* . Each Engine (CI or dev machine) is given up to 6 minutes to gracefully shut down.

hardy ore
#

I can check the cache size for your account, what I need to know which one I should look at. Your devs or CI environment ๐Ÿ™

nocturne forge
#

Yep, understood. The trace above was ran from CI. When I ran it locally and restarted my local engine, cache pull was much faster.

#

The CI environment is the current priority, aws us-west-2.

hardy ore