#OCI Registry Module Distribution
1 messages · Page 1 of 1 (latest)
Well, currently you’ve optimized for teams to have their own daggerverse repository, which you can cache and that will work fine.
But teams that have invested in their monorepo will be storing their modules in much larger repositories where the Dagger module code will be anywhere from 1-10% of the code base
Of course you can cache that, but how many multi GB or 10s of GB repositories do you want to do that for when you need a few 100MBs?
Then there’s monorepos that aren’t public
In the context of a huge monorepo, typically the initial load will be from a local checkout (otherwise no point in embedding). So in practice that big checkout of the monorepo by dagger may not happen. Instead it will be a Host.Directory. We’ll still find a caching optimization for it, we have a ton of engineering planned on improving caching across the board next year.
And if that uses a dagger module higher up in my tree?
I don’t know if I’m missing something and these are silly questions 😅
Not silly at all
That is an unsettled topic 😁 cc @stuck ravine
Technically, everything is going to become a module and services that depend on the build output of other services in a nested hierarchy are currently not possible this way
I can put together a real example of what I’d like to be able to do, if that helps
Yeah we've discussed some of this as part of this issue here: https://github.com/dagger/dagger/issues/5862
And there's a related discussion about how to actually specify dependencies in these sort of situations here: https://discord.com/channels/707636530424053791/1182712578154180648 (gonna go turn that one into an issue too after typing this)
Basically, we do want to support dependencies from one module to another within the context of a large monorepo. There's a placeholder way of getting it to work today with a root setting in dagger.json but we're gonna refactor it soon (sometime in the next month).
Of course you can cache that, but how many multi GB or 10s of GB repositories do you want to do that for when you need a few 100MBs?
Truly huge repos like this definitely will call for a lot more intelligence in how we clone+cache git repos internally. I haven't invested a ton of research time into this yet but there do appear to be ways in newer versions of the git protocol to just clone+checkout parts of a repo, which is what we'll want in order to be able to pull only parts of the repo we actually need for loading each module.
That sort of optimization is less likely to come in the immediate future (probably better as part of the caching optimization work next year Solomon mentioned) but I'll make sure it's tracked in an issue.
That would be super helpful, we can figure out what works in the immediate term best and then use it as input for the above mentioned improvements were planning too 🙏
(issue summarizing the discord thread I linked to w/ an initial strawman proposal on one possible specific approach: https://github.com/dagger/dagger/issues/6291)