#Large Python packages seem to invalidate cache

1 messages · Page 1 of 1 (latest)

velvet bloom
#

Hi folks,

First, let me say that I love dagger and I think it's the most amazing build technology I've come across recently. Compared to GHA and other solutions, the concept is so clean, and the team loves being able to run build pipelines locally without having to go through Git and esoteric YAML. We've been using it for a couple of weeks to be a build & deploy pipeline for our microservices-based app.

That being said, one thing I've spent a while chasing down is why the cache seems to invalidate for unclear reasons. Doing a lot of black-box debugging, I finally narrowed it down essentially (not tested in isolation) to a "RUN pip install langchain" and "RUN pip install sentence-transformers" that break the cache. This confused me greatly, because simple RUN statements are... cacheable! In fact, I could pip install most Python packages without invalidating the cache, but these ones do.

So after reading through the discord a bit, I open the black box and peer inside the dagger engine logs and etc. and suspect that the whole "build kit garbage collection" thing is deciding to throw away very large cache images - those two libraries together comprise 1.5 GB, so are very large.

I'm going to modify the buildkit toml files and stuff to see if I can disable or raise the threshold for GC and let you all know if it works out... By apparently running the dagger engine manually with the EXPERIMENTAL_DAGGER_RUNNER_HOST variable and mounting a custom toml config, but in the meanwhile...

I guess to the dagger team & community, what are your thoughts on this? Have you all seen it before, is there a plan to modify dagger's default built-in behavior to address stuff like this - especially relevant when dealing with deep learning/large-language-model packages?

frank current
# velvet bloom Hi folks, First, let me say that I love dagger and I think it's the most amazin...

hey @velvet bloom! first and foremost thx for taking the time to share your journey and investing in giving Dagger a try. re: cache invaldiation and GC, yes! we do have mapped and prioiritized in the roadmap provide more visibility (in Dagger Cloud probably) and allow your team to tune this knobs in more automatic and UX friendly way. Seems like you're interested in this particular feature so would it be ok to ping you as soon as we have something to share to get feedback about it?

cc @thorn granite

velvet bloom
#

Hi Marco, sure, please feel free to ping me on that matter!

velvet bloom
#

Update:

I volume-mounted a custom toml file where all I did was delete some default-looking stuff and set the "gckeepstorage = 90%" wherever I found that key, and that seems to have stopped that error from occurring. I didn't do anything intelligent, just set that stuff to 90% and saw it worked. For posterity, within the docker engine container, /etc/dagger/engine.toml is the buildkit config file, rather than the default path of /etc/buildkit/buildkitd.toml

[worker.oci]
gc = true
gckeepstorage = "90%"
max-parallelism = 8
cniPoolSize = 16

[[worker.oci.gcpolicy]]
keepBytes = "90%"
keepDuration = "48h"
filters = [ "type==source.local", "type==exec.cachemount", "type==source.git.checkout"]

[worker.containerd]
address = "/run/containerd/containerd.sock"
enabled = true
platforms = [ "linux/amd64" ]
namespace = "buildkit"
gc = true

gckeepstorage sets storage limit for default gc profile, in bytes.

gckeepstorage = "90%"

maintain a pool of reusable CNI network namespaces to amortize the overhead

of allocating and releasing the namespaces

cniPoolSize = 16

configure the containerd runtime

[worker.containerd.runtime]
name = "io.containerd.runc.v2"
options = { BinaryName = "runc" }

[[worker.containerd.gcpolicy]]
keepBytes = "90%"
keepDuration = 172800
filters = [ "type==source.local", "type==exec.cachemount", "type==source.git.checkout"]
[[worker.containerd.gcpolicy]]
all = true
keepBytes = "90%"