Hi folks,
First, let me say that I love dagger and I think it's the most amazing build technology I've come across recently. Compared to GHA and other solutions, the concept is so clean, and the team loves being able to run build pipelines locally without having to go through Git and esoteric YAML. We've been using it for a couple of weeks to be a build & deploy pipeline for our microservices-based app.
That being said, one thing I've spent a while chasing down is why the cache seems to invalidate for unclear reasons. Doing a lot of black-box debugging, I finally narrowed it down essentially (not tested in isolation) to a "RUN pip install langchain" and "RUN pip install sentence-transformers" that break the cache. This confused me greatly, because simple RUN statements are... cacheable! In fact, I could pip install most Python packages without invalidating the cache, but these ones do.
So after reading through the discord a bit, I open the black box and peer inside the dagger engine logs and etc. and suspect that the whole "build kit garbage collection" thing is deciding to throw away very large cache images - those two libraries together comprise 1.5 GB, so are very large.
I'm going to modify the buildkit toml files and stuff to see if I can disable or raise the threshold for GC and let you all know if it works out... By apparently running the dagger engine manually with the EXPERIMENTAL_DAGGER_RUNNER_HOST variable and mounting a custom toml config, but in the meanwhile...
I guess to the dagger team & community, what are your thoughts on this? Have you all seen it before, is there a plan to modify dagger's default built-in behavior to address stuff like this - especially relevant when dealing with deep learning/large-language-model packages?