#First FROMs are always pulled, not caching

1 messages · Page 1 of 1 (latest)

zinc swift
#

The first two FROMs in my Dagger pipeline are always pulling and extracting every time I run go run ./test/dagger/main/hof.go

The code is effectively:

func (R *Runtime) BaseContainer() (*dagger.Container) {

    c := R.Client.Container().From("golang:1.20")

    // add tools
    c = R.AddDockerCLI(c)

    // setup workdir
    c = c.WithWorkdir("/work")

    return c
}

func (R *Runtime) AddDockerCLI(c *dagger.Container) (*dagger.Container) {
    dockerCLI := R.Client.Container().From("docker:24").
        File("/usr/local/bin/docker")

    c = c.WithFile("/usr/local/bin/docker", dockerCLI)

    return c
}

Am I missing something?

#

Does using cntr.Sync(ctx) impact this?

lethal dagger
#

I don't see that behavior when I

git clone https://github.com/hofstadter-io/hof.git
cd hof
git checkout gen-path-simplifications
cd test/dagger
go run ./main/dagger-in-dagger.go
#

If I run

go run ./test/dagger/main/hof.go
#

from the repo root on that branch, it looks like those images are cached. Easier to see with dagger run (ensure your dagger cli is v0.6.2 for Go SDK v0.7.2)

dagger run go run ./test/dagger/main/hof.go
#

This is a bit of time spent checking the image tags, I believe you can cut that out by using the full image SHA.

#

0.04s vs 0.51s

lethal prawn
lethal prawn
zinc swift
#

I can see differences if I move the cache volumes around, moving lines like the following to different points in the pipeline

    // setup mod cache
    modCache := R.Client.CacheVolume("gomod")
    c = c.WithMountedCache("/go/pkg/mod", modCache)
#

I see both, one, or none cached, on my cloud vm

#

the green means it is actually pulling, doesn't it?

#

Another example, if I run it twice in a row, it rebuilds the binary anyway, without code changes. Then the container that copies out the binary for runtime, it does get cache, so is not getting the new binary? It's a little opaque, though the new UI helps a lot

zinc swift
#

hmm, possibly I'm hitting weird buildkit cache issues. Is there some auto image pruning that might happen as a disk gets full?

If I reset all the docker containers and anon volumes, things behave inline with how I'd expect them to. Was able to play around by changing various files and see the magic. I've increased the VM resources and the situation has improved

lethal prawn
#

cache mounts and image pulls should not interfere with each other… that’s surprising and maybe indicates a bug?

lethal prawn
zinc swift
#

I did not have a very large disk attached, the volumes were about 10G themselves

lethal dagger
#

After killing my dagger engine and clearing all volumes and re-running, this is what I get after the first run. Holds steady on subsequent runs.

#

3 GB volume

wintry raven
remote edge
#

I have hit this too and assumed that buildkit is doing some pruning. Because I hit this typically when I run app A and then app B and then get back to app A after some time.

#

I've observed this for images not the cache volumes afaict

wintry raven
#

@zinc swift you should see some pruning events in the engine logs when this happens

zinc swift
lethal prawn
#

TIL buildkit does pruning ootb

#

I wonder what the or united strategy is - and whether it affects build times when it occurs?

wintry raven
zinc swift
#

tbf, this only showed up for me when running dagger-in-dagger on a small cloud vm (20G disk)

#

seems fine in GHA, so we are good for now

gentle burrow
#

This is cropping up for me and hampering my workflow. Even with a specific SHA reference instead of a label, the container is pulled every time. As a 15GB container, this takes a while. Is there any way to configure the GC to leave it cached?

gentle burrow
#

Dagger uses its own fork of buildkit, though, right? So I would need to create /etc/buildkit/buildkitd.toml inside the Dagger engine container?

zinc swift
#

you can just mount it in, yes. I have some setups that require this

wintry raven
# gentle burrow Dagger uses its own fork of buildkit, though, right? So I would need to create ...

we don't use a fork, we use official upstream buildkit. Generally we avoid using forks and we send patches upstream since the buildkit team is generally quite responsive to merge. We have unified the config file in an engine.toml that you can supply to the engine on creation. More info about this here: https://github.com/dagger/dagger/blob/main/core/docs/d7yxc-operator_manual.md

GitHub

A programmable CI/CD engine that runs your pipelines in containers - dagger/dagger

gentle burrow
#

Thanks! Lots of useful info in there.

gentle burrow
#

Well, unfortunately, even with

[worker.oci]
gckeepstorage = 128000
[[worker.oci.gcpolicy]]
all = true
keepBytes = "128GB"

... in a file mounted to /etc/dagger/engine.toml (confirmed with docker exec dagger-engine-... cat /etc/dagger/engine.toml), the @sha256:...-referenced from() container is still re-pulled on subsequent runs.

Not sure what to try next short of digging through Buildkit's source code.

lethal dagger
#

@ornate shuttle any ideas?

gentle burrow
#

Upon closer inspection, it seems like it's re-pulling a specific layer (and not the biggest one - weird?). So there is some improvement at least. But not completely there.

ornate shuttle
wintry raven
#

In addition to what tibor is saying, the engine logs should print something when performing GC IIRC

remote edge
#

Interestingly, I'm seeing this too. Certain layers are always pulled. Even for subsequent builds. I'll try to do a repro if possible.

gentle burrow
#

Thanks for the diagnosis tips. I'll take a look when I get the chance.

pulsar rampart
#

@floral flint

#

@snow oriole

wintry raven