#Dagger engine caching

1 messages · Page 1 of 1 (latest)

waxen fern
#

Hi all,
I noticed that most of the time consumed by my dagger call is actually the pulling of the dagger engine image.
Is there a way of caching this on the local filesystem?
Atm im running it in gha on selfhosted codebuild runners with s3 cache backend. Im already caching the entire /var/lib/docker directory but that does not seem to be enough.

jagged narwhal
waxen fern
#

the image is not there. It seems like an issue with gha - codebuild. Im starting to think its not possible to cache the /var/lib/docker to codebuild backed by s3

jagged narwhal
waxen fern
#

I saw that. But if im not mistaken, Dagger has no option to specifiy a target to cache your layers too?

shy radish
waxen fern
#

Thanks for clarifiying. Ill try to figure out where and how to set the buildkit cache export options.

Big changes to the underlying infra is not possible because of our usecase.
We are supporting multiple different cicd solutions atm and im leveraging dagger to distribute functions that should be ran on every possible cicd solution.

ps not sure what you mean with bk export tbh...

shy radish
#

bk export:

  1. Doesn't support merging cache results. The last run wins. If you're not careful you can easily make performance worse than with no caching. So you must very carefully tune your export configuration (which bucket & manifest to write to, for which workflow). This causes infra & app concerns to be tightly coupled, which is brittle and slows you down.

  2. Doesn't persist cache volumes. Depending on where your pipelines spend the most time, you may not get the boost you're hoping for.

  3. Generally has bugs and edge cases, since it's not used as widely as the rest of docker build

jagged narwhal
waxen fern
#

I tried to implement the s3 cache by setting the cacheenv als a argument to the module. And creating a container with that env var to run my function

func New(
    //+optional
    bucket string,
) *Metrics {
    container := dag.Container().From("gcr.io/distroless/static-debian12")

    var cacheEnv string
    if bucket != "" {
        cacheEnv = "type=s3,mode=max,region=eu-west-1,use_path_style=true,bucket=" + bucket
        container = container.WithEnvVariable("_EXPERIMENTAL_DAGGER_CACHE_CONFIG", cacheEnv)
    }
    return &Metrics{
        Container: container,
        CacheEnv:  cacheEnv,
    }
}

So when i run dagger call --bucket cachebucket metrics .... hoping it would leverage the s3 caching mechanisme of buildkit.
Alltough i can see the env var is set on the container i by default use to run nothing is landing in the bucket or there is no trace of uploading the image in the run logs.

Container.withEnvVariable(name: "_EXPERIMENTAL_DAGGER_CACHE_CONFIG", value: "type=s3,mode=max,region=eu-west-1,use_path_style=true,bucket=cachebucket"): Container!
#

Any hints where to look?

shy radish
waxen fern
#

Thanks you for helping out. But i feel really stupid now 😄
I configured my gh action to have the env var as you pointed out but nothing is happening or logged related to caching... am i missing something really stupid here?
This is how i did the gha:

      - name: Dagger
        uses: dagger/dagger-for-github@8.0.0
        env:
          AWS_ACCESS_KEY_ID: ${{ env.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ env.AWS_SECRET_ACCESS_KEY }}
          AWS_SESSION_TOKEN: ${{ env.AWS_SESSION_TOKEN }}
          AWS_ACCOUNT_ID: ${{ env.AWS_ACCOUNT_ID }}
          _EXPERIMENTAL_DAGGER_CACHE_CONFIG: "type=s3,mode=max,region=eu-west-1,use_path_style=true,bucket=gh-cache-prod,access_key_id=$AWS_ACCESS_KEY_ID,secret_access_key=$AWS_SECRET_ACCESS_KEY,session_token=$AWS_SESSION_TOKEN"
        with:

And after this i call my function as i alway did, but its not caching the dagger engine container...

shy radish
jagged narwhal
# waxen fern Thanks you for helping out. But i feel really stupid now 😄 I configured my gh a...

@waxen fern the dagger image (and any other OCI image) can't be efficientely stored in the free GHA tier cache. The reason for this is that even though if you store a tar version of the image in the cache and then restore it on every run, it'll still be around the same time or even slower than pulling it directly from registry.dagger.io. This is because there's no really any significant improvement over pulling the tar image from the cache, and import it to the engine.

shy radish
#

Assuming you use the default docker provisioner (CLI calls docker run registry.dagger.io/engine), some CI platforms have a proprietary "docker engine cache" feature, which reuses the same docker state in between runners. If that feature exists on your CI, and you enable it, it could help speed up dagger engine initialization, since it wouldn't always have to be re-downloaded. I think AWS Codebuild has that for example

jagged narwhal
#

Skimming through CodeBuild's docs, seems like the "local Docker layer cache" is what Solomon might be referring to: https://docs.aws.amazon.com/codebuild/latest/userguide/caching-local.html. From an initial impression and also kind of confirmed by this re:Post (https://repost.aws/questions/QUr8_kFPLjRa2n69BEdR9ZuQ/how-can-i-make-docker-layer-cache-permanent-when-i-build-docker-images-in-codebuild#ANY5vOlKw3T6alIf3SEUfPHg) it seems to be a "quick hack" where they try to schedule all the same builds in a given interval to the same host so you can re-use the cache. It kind of works but as stated in the AWS docs, if your builds are not super frequent, then it's the same as having no cache at all

waxen fern
#

Okay wow thanks, this is great info.
I was hoping to cache to s3 and minimise altering the codebuild setup as much as possible cause I’ll need to distribute te setup across the teams.
But yeah I’m aware of the docker layer caching and even more interesting the new docker build server in aws.
Again thank you very much for helping me out.

jagged narwhal
#

@waxen fern one thing that just occurred to me which might speed up the engine pull phase after skimming through the CodeBuild docs is if you create a custom runtime image with the dagger-engine-image.tar file. Then, in the entrypoint of that runtime you could docker load that tar file so the engine image is present when the codebuild pipeline starts

#

You'd have to benchmark it but I'd assume that will be a bit faster than pulling and unpacking the image when the pipline starts