#Different use cases for file cache vs publishing an image

1 messages · Page 1 of 1 (latest)

silver grove
#

This is maybe just due to my limited understanding of containers but Dagger has a concept of publishing an image (or like partial image) to something like Docker or AWS ECR & also a file cache. It seems to me like (assuming you have access to a private image repo) you would always want to just publish a partial image and use that as a partial build cache. I am thinking especially in the case of NPM dependencies. But I am wondering if I am thinking about that wrong and image cache should really only be for system level dependencies, and the file cache should be more for things like NPM dependencies?

hot smelt
#

👋 I think this post will give you more insights about the different types of caching and how we're addressing that in Dagger (https://discord.com/channels/707636530424053791/1124399722640183346). In practice, caching by image or file shouldn't be a concern of the user. We're building our distributed cache solution so you can specific what you want to be cached (WithCachedVolume through our SDK's and then we can take care of making that cache work in the best way anywhere.

Discord

Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.

#

If you want to know more how our caching solution works and give it a try, happy to provide more info. cc @sweet patio

silver grove
#

Ah ok, due to the name "cache volume" i assumed this was like storing/persisting files directly but that makes sense the implementation could be either. So it sounds like generally pointing people towards withCacheVolume rather than publishing a partial image is the Dagger Way (although both seem to work)

dusty jay
#

A few thoughts:

  • Dagger caches all operations by default. Whether it’s executing a command, copying or downloading files, etc: if dagger has performed the same operation in the past with the same inputs, it will fetch the outputs from cache instead of re-running. This happens implicitly without you having to request it in your code
#
  • Sometimes Dagger executes a tool that has its own caching feature: usually package managers or compilers like npm, maven, go etc. For those tools to cache properly they need their own cache data (usually a directory) to be persisted between runs. So Dagger needs to pass these input directories to the container in a special way, because a regular directory will not have its contents change from one run of the tool to the next. That’s why there’s a special type of directory called CacheVolume. Those are not meant to be shared outside of Dagger: they are for persisting specific parts of the internal state of your pipeline, for optimal use of your tool’s native caching features.
silver grove
#

Right now just cacheing NPM dependencies basically per repo/branch

silver grove
silver grove
dusty jay
silver grove
#

oh ok cool, maybe its a noop for me then if Dagger Cloud will just handle it

#

thats my ideal scenario 🙂

dusty jay
#

We are recruiting testers at the moment if you’re interested 🙂 Or have you already been contacted about that in another thread? Sorry there’s been a sudden surge of Dagger Cloud discussions on discord so I’m starting to lose track 😅

silver grove
#

yup we are in there! Ithink @whole summit is going to try to setup this week

#

as part of us trying to improve perf stuff so its all happening. Appreciate the input