This is maybe just due to my limited understanding of containers but Dagger has a concept of publishing an image (or like partial image) to something like Docker or AWS ECR & also a file cache. It seems to me like (assuming you have access to a private image repo) you would always want to just publish a partial image and use that as a partial build cache. I am thinking especially in the case of NPM dependencies. But I am wondering if I am thinking about that wrong and image cache should really only be for system level dependencies, and the file cache should be more for things like NPM dependencies?
#Different use cases for file cache vs publishing an image
1 messages · Page 1 of 1 (latest)
👋 I think this post will give you more insights about the different types of caching and how we're addressing that in Dagger (https://discord.com/channels/707636530424053791/1124399722640183346). In practice, caching by image or file shouldn't be a concern of the user. We're building our distributed cache solution so you can specific what you want to be cached (WithCachedVolume through our SDK's and then we can take care of making that cache work in the best way anywhere.
Discord
Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
If you want to know more how our caching solution works and give it a try, happy to provide more info. cc @sweet patio
Ah ok, due to the name "cache volume" i assumed this was like storing/persisting files directly but that makes sense the implementation could be either. So it sounds like generally pointing people towards withCacheVolume rather than publishing a partial image is the Dagger Way (although both seem to work)
It depends what kind of data you wanto to cache. Do you mind sharing a bit more about your use case, to make sure I get the context right?
A few thoughts:
- Dagger caches all operations by default. Whether it’s executing a command, copying or downloading files, etc: if dagger has performed the same operation in the past with the same inputs, it will fetch the outputs from cache instead of re-running. This happens implicitly without you having to request it in your code
- Sometimes Dagger executes a tool that has its own caching feature: usually package managers or compilers like npm, maven, go etc. For those tools to cache properly they need their own cache data (usually a directory) to be persisted between runs. So Dagger needs to pass these input directories to the container in a special way, because a regular directory will not have its contents change from one run of the tool to the next. That’s why there’s a special type of directory called
CacheVolume. Those are not meant to be shared outside of Dagger: they are for persisting specific parts of the internal state of your pipeline, for optimal use of your tool’s native caching features.
Right now just cacheing NPM dependencies basically per repo/branch
Oh this is great & i have noticed it locally, but I am not sure how/if it would persist across machines/CI runs rihgt now which is really what I am after. Although my guess is this is what hooks into dagger cloud?
Yeah in the past I have put a container layer in ECR named off the SHA of the package-lock for example as a kind of cache. This actually worked pretty cleanly and is kind of the behavior I was going after here. vs like cacheing a directory or something and persisting/mounting it between runs
yes exactly. Dagger Cloud takes care of all that
oh ok cool, maybe its a noop for me then if Dagger Cloud will just handle it
thats my ideal scenario 🙂
We are recruiting testers at the moment if you’re interested 🙂 Or have you already been contacted about that in another thread? Sorry there’s been a sudden surge of Dagger Cloud discussions on discord so I’m starting to lose track 😅