#Question about cache garbage collection behavior

1 messages · Page 1 of 1 (latest)

mild spindle
#

Hello Dagger team,

We are using Dagger extensively with a shared engine across multiple workloads.

We have configured garbage collection: because disk space fills up quickly on the engine, garbage collection is triggered roughly twice a day.

From our understanding, Dagger relies on different kinds of cache:
• Function cache
• Layer cache
• Volume cache

We would like to better understand how garbage collection behaves across these different cache types.

Context

We suspect we are hitting an issue related to cache eviction:
• The volume cache seems to be pruned by garbage collection
• However, some commands responsible for populating those volumes appear to still be cached at the layer level
• As a result, those commands are not re-executed, and the execution fails because the expected data in the volume is no longer present

Questions
1. Are all cache types (function, layer, volume) garbage-collected in the same way and with the same lifecycle?
2. Is it possible for volume cache to be pruned while the corresponding layers remain cached, leading to this kind of inconsistency?
3. If this behavior is expected, what is the recommended way to avoid this situation?

Thanks in advance for your help and for any guidance you can provide.

mild spindle
#

Hello, gentle up 🙂

charred iris
#

👋 To answer your questions:

  1. Yes for the most part. Function caching has TTLs (default is 7 days) which is it's own special thing separate from the other GC logic, but otherwise it's all essentially the same
  2. Yes very possible. Cache volumes are really meant to be best effort but are free to be pruned whenever
  3. You may be able to improve this via engine config (https://docs.dagger.io/reference/configuration/engine/#garbage-collection)

For example you could try using a policy like this to prefer saving cache volumes for longer and be more biased towards pruning exec layers:

  {
    "gc": {
      "policies": [
        {
          "filters": [
            "type==regular",
            "type==source.local",
            "type==source.git.checkout"
          ],
          "keepDuration": "48h",
          "maxUsedSpace": "512MB"
        },
        {
          "filters": [
            "type==exec.cachemount"
          ],
          "keepDuration": "168h",
          "maxUsedSpace": "20GB"
        },
        {
          "keepDuration": "1440h",
          "reservedSpace": "10GB",
          "maxUsedSpace": "75%",
          "minFreeSpace": "20%"
        },
        {
          "reservedSpace": "10GB",
          "maxUsedSpace": "75%",
          "minFreeSpace": "20%"
        },
        {
          "all": true,
          "reservedSpace": "10GB",
          "maxUsedSpace": "75%",
          "minFreeSpace": "20%"
        }
      ]
    }
  }

That'd be biased towards saving up to 20GB of cache mounts, if you need more than that you can bump the "20Gb" setting

Learn how to configure the Dagger Engine for your workflows, including logging, security, garbage collection, and more.

#

Let me know if that helps!

rich oyster
#

One thing I'd add to Erik's comment is that you should not use cache volumes as persistent storage. Cache volumes are best effort cache directories which are generally used to speed-up specific parts of the pipeline when they're present. A good example of this is directories populated by package managers like node_modules, $gopath, and Maven's .m2 folders respectively.

Instead of using cache volumes in your case, you might want to pass a *dagger.Directory or File across your pipeline steps so you don't run into this issue

mild spindle
nova egret
#

Thanks for all your information, currently we are testing this configuration for a 2tb disk ( gc_max_used_space: "85%"
gc_policy:
reservedSpace: "40GB"
maxUsedSpace: "85%"
minFreeSpace: "25%"
sweepSize: "20%"
policies:
- filters: ["type==source.local", "type==source.git.checkout"]
keepDuration: "172h"
maxUsedSpace: "5%"
- filters: ["type==regular"]
keepDuration: "172h"
maxUsedSpace: "30%"
- filters: ["type==exec.cachemount"]
keepDuration: "172h"
maxUsedSpace: "50%"
- keepDuration: "720h"
maxUsedSpace: "80%"
reservedSpace: "10GB"
minFreeSpace: "15%"
- all: true
maxUsedSpace: "80%"
reservedSpace: "10GB"
minFreeSpace: "15%"

)

But currently, we lack a way to ensure our configuration is correctly applied and to check whether it is correct. Do you have any information that would help us check all types of cache entries?

rich oyster
nova egret
#

would be great for us

rich oyster