#OCI Registry Module Distribution

1 messages · Page 1 of 1 (latest)

heady cypress
#

Well, currently you’ve optimized for teams to have their own daggerverse repository, which you can cache and that will work fine.

#

But teams that have invested in their monorepo will be storing their modules in much larger repositories where the Dagger module code will be anywhere from 1-10% of the code base

#

Of course you can cache that, but how many multi GB or 10s of GB repositories do you want to do that for when you need a few 100MBs?

#

Then there’s monorepos that aren’t public

crisp storm
#

In the context of a huge monorepo, typically the initial load will be from a local checkout (otherwise no point in embedding). So in practice that big checkout of the monorepo by dagger may not happen. Instead it will be a Host.Directory. We’ll still find a caching optimization for it, we have a ton of engineering planned on improving caching across the board next year.

heady cypress
#

For module installation?

#

How can that use Host.Directory?

crisp storm
#

Something like dagger call -m ./ci

#

(CLI does the Host.Directory call for you)

heady cypress
#

And if that uses a dagger module higher up in my tree?

#

I don’t know if I’m missing something and these are silly questions 😅

crisp storm
#

Not silly at all

crisp storm
heady cypress
#

Technically, everything is going to become a module and services that depend on the build output of other services in a nested hierarchy are currently not possible this way

#

I can put together a real example of what I’d like to be able to do, if that helps

stuck ravine
#

Yeah we've discussed some of this as part of this issue here: https://github.com/dagger/dagger/issues/5862

And there's a related discussion about how to actually specify dependencies in these sort of situations here: https://discord.com/channels/707636530424053791/1182712578154180648 (gonna go turn that one into an issue too after typing this)

Basically, we do want to support dependencies from one module to another within the context of a large monorepo. There's a placeholder way of getting it to work today with a root setting in dagger.json but we're gonna refactor it soon (sometime in the next month).

Of course you can cache that, but how many multi GB or 10s of GB repositories do you want to do that for when you need a few 100MBs?
Truly huge repos like this definitely will call for a lot more intelligence in how we clone+cache git repos internally. I haven't invested a ton of research time into this yet but there do appear to be ways in newer versions of the git protocol to just clone+checkout parts of a repo, which is what we'll want in order to be able to pull only parts of the repo we actually need for loading each module.

That sort of optimization is less likely to come in the immediate future (probably better as part of the caching optimization work next year Solomon mentioned) but I'll make sure it's tracked in an issue.

stuck ravine
stuck ravine