#Use with_directory() with included but ignored path

1 messages ยท Page 1 of 1 (latest)

fleet yew
#

Hello all ๐Ÿ™‚

It might be that I am working against some general concepts, so I am very open to alternatives.

My challenge:

I have a mono repository with some applications, terraform, documentation and so on.
The individual "modules" are separated by using the with_directory() function for caching/only building what changed.
Therefore I use the include_path=[] argument to express dependencies to other modules/artifacts etc..

I use just a lot for the actual actions and I use git describe for my versioning. Therefore I have to include the .git directory for most of my modules.

This leads to the issue that every time I git add/commit/ammend something, the cache is invalidated and all "modules" which include the .git folder are rebuild.

What I am looking for:

A way to include dependencies which do not invalidate the cache.

I know that this sounds controversial and one might even say that the current behavior is the correct one, since I really have a new version and I should rebuild things... But in reality only my git sha changed and nothing else.
This is especially a problem for development. During the actual CI build it is exactly what I want...

So as I said in the beginning, I am also up for alternative approaches to reach the same goal ๐Ÿ™‚

Thanks
Paul

buoyant acorn
#

@fleet yew seems like what you need for the dev workflow is to use the .git folder as a cache volume instead.

You can accomplish it by doing something like this:

    gitCache := client.CacheVolume("git")
    git := client.Host().Directory(".git")

    // populates the cache
    client.Container().
        From("alpine").
        WithWorkdir("/app").
        WithDirectory(".git", git).
        WithMountedCache("gitcache", gitCache).
        WithExec([]string{"cp", "-a", ".git/.", "gitcache"}).
        Sync(ctx)

    _, err = client.Container().
        From("alpine").
        WithWorkdir("/app").
        WithMountedCache(".git", gitCache).
        WithExec([]string{"ls", "-la", ".git"}).
        Sync(ctx)

#

this way you can modify whatever you want in your .git folder and that won't invalidate the cache in the pipelines that require the .git volume

fleet yew
#

Thanks a lot, this is doing what I wanted ๐Ÿ‘ . But I am wondering a little bit why this is the case ๐Ÿ˜„

Given the example with the node modules: Wouldn't I want to rebuild my application in case any of the external dependencies changed? The way you provided is hiding this fact isn't it?

I could build some logic to update the cache-volume in case any of my package.json or package-lock.json or node_modules changed, but all the downstream containers would ignore the actual change in the cache?

Maybe I miss out something here?

Here is my working example with the python SDK in case it is interesting for anybody else:

async def build_info(client, root_dir, git_cache):
    build_info = (
        ci_ctr(client)
        .with_directory("/src/",root_dir)
        .with_mounted_cache("/src/.git", git_cache)
        .with_exec(["date"])
        .with_exec(["ls", "-la"])
        .with_exec(["cat", "/etc/os-release"])
        .with_exec(["just", "version"])
    )
    await build_info


async def update_git_cache(client, root_dir, git_cache):
    ctr = (
         client.container()
         .from_("alpine")
         .with_directory("/src/", root_dir, include=[".git"])
         .with_mounted_cache("gitcache", git_cache)
         .with_exec(["cp", "-a", "/src/.git/.", "gitcache"])

    )
    await ctr


async def main():
    config = dagger.Config(log_output=sys.stdout)
    async with dagger.Connection(config) as client:
        root_dir = client.host().directory(utils.ROOT_DIR, exclude=[
            "**/.terraform",
            "docs/src/**",
            "docs/book/**",
            ".devcontainer",
            ".vscode"
        ])

        git_cache = client.cache_volume("git")
        await update_git_cache(client, root_dir, git_cache)
        await build_info(client, root_dir, git_cache)
buoyant acorn
#

Given the example with the node modules: Wouldn't I want to rebuild my application in case any of the external dependencies changed? The way you provided is hiding this fact isn't it?

what's the example with the node_modules?

#

generally in a tyipical node build flow, since package.json and package.lock.json will change, that will invalidate and force the subsequent operations to re-run

#

I think your use-case is special in a way that you need your pipeline to work in a specific way locally and differently in CI

#

that's why I've suggested this "workaround" to pre-populate the cache beforehand

fleet yew
#

Sorry, this is what I was talking about: https://docs.dagger.io/quickstart/635927/caching/

And you are right, I have overseen that for the node_modules the manifests are included and npm install is run, so only the deltas (if there are any) are updated and everything is fine.

But I think it is still worth pointing out that a change in a mounted volume is not invalidating the cache if I see it right?

buoyant acorn
#

But I think it is still worth pointing out that a change in a mounted volume is not invalidating the cache if I see it right?

yes, that's correct. If we're missing that from the docs we should definitely add a note. cc @rugged grove

rugged grove
indigo current
#

I'm attempting to use a CacheVolume as noted above. Is the CacheVolume supposed initialized empty everytime I run the code? The SDK says that a CacheVolume is a "directory whose contents persist across runs". is a "run" the lifetime of a pipeline and therefore it will start empty on the next pipeline run ? can I persist a cache volume locally while developing?

buoyant acorn
#

Is the CacheVolume supposed initialized empty everytime I run the code?

no, the cache volume is persistent across runs as long as you're using the same cache volume name.

#

CacheVolume is a "directory whose contents persist across runs".

what this is trying to say that the cache volume will not be cleaned up across runs.

#

can I persist a cache volume locally while developing?

yes, this is how it works by default

indigo current
#

ok, that's what I was expecting but I'm seeing it as empty every time. looking at other examples, I'm not sure what I'm missing

buoyant acorn
#

what SDK are you using?

indigo current
#

golang. dagger v0.9.3

#

I hit a discord character limit .. one minute..

#
buildUtilsRepo := client.Git(os.ExpandEnv("https://testing:${DAGGER_TOKEN}@private-gitlab/code/build-utils.git"),
        dagger.GitOpts{KeepGitDir: true})
buildUtilsCheckout := buildUtilsRepo.Branch("main")
commitId, err := buildUtilsCheckout.Commit(ctx)
if err != nil {
        panic(err)
}
virtualenvImage := client.Container(dagger.ContainerOpts{Platform: dagger.Platform(archPlatform)}).WithRootfs(buildBaseImage.Directory("/")).
        WithFile("/home/user/.pip/pip.conf",
                buildContext.File("/pip.conf"),
                dagger.ContainerWithFileOpts{Owner: "user"}).
        WithUser("user")
virtualenvImage.WithMountedCache("/tmp/build-utils",
        daggerCache,
        dagger.ContainerWithMountedCacheOpts{Owner: "user", Sharing: dagger.Shared}).
        WithExec([]string{"/bin/sh", "-c", "ls -al /tmp/build-utils"}).Stdout(ctx)
t_c := virtualenvImage.WithMountedCache("/tmp/build-utils",
        daggerCache,
        dagger.ContainerWithMountedCacheOpts{Owner: "user", Sharing: dagger.Shared}).
        WithExec([]string{"/bin/sh", "-c", "cd /tmp/build-utils; git log --format=%H -1"})
cachedCommit, err := t_c.Stdout(ctx)
if err != nil || strings.TrimSpace(cachedCommit) != commitId {
        fmt.Printf("WARNING: %s_%s_\n", commitId, cachedCommit)
        virtualenvImage.
                WithDirectory("/tmp/build-utils",
                        buildUtilsCheckout.Tree(),
                        dagger.ContainerWithDirectoryOpts{Owner: "user"}).
                WithMountedCache("gitcache",
                        daggerCache,
                        dagger.ContainerWithMountedCacheOpts{Owner: "user", Sharing: dagger.Shared}).
                WithExec([]string{"/bin/sh", "-c", "ls -l /tmp/build-utils; ls -al ; ls -al gitcache; cp -a  /tmp/build-utils/. gitcache"}).Stdout(ctx)
}
buoyant acorn
#

it's ok if you exceed the char limit. You can upload as a file.

Here's an example similar to what you're doing to illustrate cache persists across runs:

package main

import (
    "context"
    "fmt"
    "os"
    "time"

    "dagger.io/dagger"
)

func main() {
    ctx := context.Background()
    c, err := dagger.Connect(ctx, dagger.WithLogOutput(os.Stderr))
    if err != nil {
        panic(err)
    }

    vc := c.CacheVolume("virualenvCache")

    c.Container().From("alpine").
        WithMountedCache("/tmp/build-utils", vc, dagger.ContainerWithMountedCacheOpts{Sharing: dagger.Shared}).
        WithExec([]string{"touch", fmt.Sprintf("/tmp/build-utils/foo-%s.txt", time.Now().String())}).
        WithExec([]string{"ls", "-la", "/tmp/build-utils/"}).
        Sync(ctx)
}

^ if you run that multiple times, you'll see that the ls at the end will print new files each time

indigo current
#

yes I do see multiple files with multiple runs. hmm

#

oh, on the 6th run, now it's empty

buoyant acorn
#

hmm it could be that the engine is garbage collecting that volume if you're running out of space

#

how are you running the engine? docker?

indigo current
#

harkening back to my first help question.. yes. dagger-engine is running in docker. I see it in docker ps.

#

I checked docker volume ls before and after running it with a new cache volume name and I don't see a new entry. should I see the cache volume in that output?

#

added wrinkle, I setup nerdctl w/rootless containers on this box awhile ago but I don't think I see dagger interacting with it

buoyant acorn
#

I checked docker volume ls before and after running it with a new cache volume name and I don't see a new entry. should I see the cache volume in that output?
this is expected, Dagger cache volumes are not docker volumes. They're unrelated

#

can you run docker logs $engine_container and see if you spot any "removed snapshot" message in there?

#

that means the Dagger engine is deleting some things since it's trying to free space

#

what OS are you running this on? Mac, Linux, Windows?

#

what this probably means si that your docker setup is probably running out of space

indigo current
#

yes I see "removed snapshot", no timestamps so can't really tell when the last one was. but I see output related to my recent runs.
I also see this, but I haven't correlated these with caching failures, yet:

Dec 06 10:34:32 linuxnuc dockerd[1524]: time="2023-12-06T10:34:32.687409698-06:00" level=error msg="Error running exec f11c74c3582c8f3c10605ea10a1f514e11d93118de8dc4b65b46a84c005919ae in container: exec attach failed: error attaching stdout stream: write unix /run/docker.sock->@: write: broken pipe"
Dec 06 10:44:45 linuxnuc dockerd[1524]: time="2023-12-06T10:44:45.146548752-06:00" level=error msg="Error running exec 2486bc1466f292decdbfbad13063ec90a1a353523f50d841c32952dace047899 in container: exec attach failed: error attaching stdout stream: write unix /run/docker.sock->@: write: broken pipe"
#

also for the "cache failure" case, I see that the alpine:latest takes longer to resolve, this 10.29s step takes < 1.0s normally:

โ”‚ โ–ฝ from alpine
โ”‚ โ–ˆ [10.29s] resolve image config for docker.io/library/alpine:latest
โ”‚ โ–ˆ [0.01s] pull docker.io/library/alpine:latest
โ”‚ โ”ฃ [0.01s] resolve docker.io/library/alpine@sha256:34871e7290500828b39e22294660bee86d966bc0017544e848dd9a255cdf59e0
โ”‚ โ”ฃโ”€โ•ฎ pull docker.io/library/alpine:latest
โ”‚ โ”ป โ”‚ 
โ–ˆโ—€โ”€โ”€โ•ฏ [0.24s] exec touch /tmp/build-utils/foo-2023-12-06 10:50:29.237931056 -0600 CST m=+0.002512914.txt
buoyant acorn
#

also for the "cache failure" case, I see that the alpine:latest takes longer to resolve, this 10.29s step takes < 1.0s normally:

this could happen because of many things. You could being throttled by dockerhub

#

2486bc1466f292decdbfbad13063ec90a1a353523f50d841c32952dace047899 in container: exec attach failed: error attaching stdout stream: write unix /run/docker.sock->@: write: broken pipe"

this message is specific to your pipeline, not sure what your'e doing there

#

going back to the root of the issue. Seems like your docker setup is low on disk space and Dagger is removing volumes as part of the garbage collection mechanism

#

that's why you see that sometimes the volumes are empty when you re-start the pipeline

indigo current
#

hmm, I have a 400G /var/lib/docker with 30% free right now. I can probably clean up more in it..

buoyant acorn
#

can you run this? docker exec $(docker ps --filter name=dagger-engine- -q) df -h /var/lib/dagger?

indigo current
#

I just ran docker system prune, so now there's more free:

/dev/mapper/cl-docker
                        412.8G    217.6G    177.2G  55% /etc/hostname
#

this is the volume that dagger is using:

60G     /var/lib/docker/volumes/102a855e1d7084da9608f535ab0db2bf6e997b68a91cedfc0ee4f82483b9bc3c/_data
#

should I see something in /var/lib/dagger/runc-overlayfs/cachemounts in the dagger-engine container?

buoyant acorn
indigo current
#

i've ran the pipeline three consecutive times and each time the volume is empty. also, I earlier piped docker logs $dagger_container into less and I missed a bunch of logs on stderr. these are the logs concerning the snapshot cleanup:
[attaching]

#

just before that:

time="2023-12-06T17:51:31Z" level=debug msg="created new ref for cache dir \"EAqpCRcIuismE7KcMZcp72rpTyp++y3KJV3AbVv9d0c=\": rktx6dyp7afscyn0cdvr8qfew" span="exec touch /tmp/b
uild-utils/foo-2023-12-06 11:51:26.552990547 -0600 CST m=+0.009079624.txt"
time="2023-12-06T17:51:31Z" level=debug msg="returning network namespace vg6umyqp0lm1hy3qfb0t3ik8u from pool" span="exec touch /tmp/build-utils/foo-2023-12-06 11:51:26.5529905
47 -0600 CST m=+0.009079624.txt"
time="2023-12-06T17:51:31Z" level=debug msg="> creating qnrkvv70pj6m2w5mdaqyfjmy9 [touch /tmp/build-utils/foo-2023-12-06 11:51:26.552990547 -0600 CST m=+0.009079624.txt]" span
="exec touch /tmp/build-utils/foo-2023-12-06 11:51:26.552990547 -0600 CST m=+0.009079624.txt"
time="2023-12-06T17:51:31Z" level=debug msg="reusing ref for cache dir \"EAqpCRcIuismE7KcMZcp72rpTyp++y3KJV3AbVv9d0c=\": rktx6dyp7afscyn0cdvr8qfew" span="exec ls -la /tmp/buil
d-utils/"
#

I restarted dagger-engine. I also noticed the load average was high but no CPU bound processes running which I think indicated I/O contention somewhere.

time="2023-12-06T18:01:16Z" level=debug msg="gc cleaned up 26276455859 bytes"
#

and now I have 10 consecutive runs where the cache is persisting prior runs. the slow image config resolution does appear unrelated. I guess the engine got into a bad state?

buoyant acorn
indigo current