#Sparse copy from a megarepo

1 messages · Page 1 of 1 (latest)

warm pelican
#

II have an unusual scenario, as I'm working with Unreal Engine. Unreal is a monorepo, and it's a huge piece of software. After checking out the UE repo, running their proprietary Git-LFS alternative, and creating a single-platform build, the workspace grows to 10GB.

I want my Dagger pipeline to support clean CI checkouts and developer workspace without copying multiple GB unnecessarily.

First I tried using Directory includes + excludes. But Dagger uploads the entire workspace to the engine before applying include/exclude.

Then I tried using a virtual directory and withDirectorying a few select folders to it. This avoided upload of .git and Intermediates and other such folders, which was good. But it included the LFS files, which was not. Ideally, Dagger only hashes the gitdeps.xml file and the actual downloading or copying of the binary files referenced by that manifest occurs in the Dockerfile build process.

Then I tried using a virtual directory and withFileing the repo contents to it using the output of git ls-tree. This avoided upload of LFS files to the Dagger dir, but took way too long as each withFile has significant overhead and there are 148K tracked files.

Finally, I tried running git daemon on the host, on the workspace's .git dir, and used git(...).commit(...).tree() instead of directory(...). This was lightning fast - less than one minute (compared to over 10 for other approaches). Success! The only issue is, I could only get it working with the host's LAN address. git("git://localhost") and git("git://host.docker.internal") did not result in a successful connection to the daemon.

So, my question ultimately is: is there a better way to network with the host than querying its LAN IP from the script? Or is there a better approach to dealing with Unreal's girth that I haven't considered? I have noticed a few questions about bind-mounts around here and the answer tends to be "don't do it, use directory".

crimson badger
#

cc @high plover

#

@warm pelican what TZ are you in? Would love to chat more with you about it to find the best possible alternative.

high plover
#

I was just going to ping you on this @crimson badger 🙂

#

Also this will be a perfect combo with the host networking feature that @young owl is cooking

#

In which you will be able to query the Dagger API for eg.

container.WithServiceBinding("gitserver", host.Service(hostname: "host.docker.internal", ports: [{frontend: 22}])
rare glacier
#

Not sure if it is related, but it would be great if https://pkg.go.dev/dagger.io/dagger#Directory.Diff put the actualy diffs in the resulting directory, it looks like it contains those files which have changed. Though it would be hard to represent a deleted file...

we fell back to diff -R ./v0.27.0 ./v0.27.4

#

This would probably be better as a different func, as I imagine the current implementation is the Buildkit one

warm pelican
#

@crimson badger very cool, we basically had the same idea. I also noticed that I could mount a unix socket to a service, but wasn't sure if that was a viable path or not because the Git server was TCP. I'll go ahead and try out the service proxy setup that you showed off in the linked video. I'm UTC-7, feel free to DM me or @ me in the appropriate channel.

crimson badger
#

@warm pelican there's also a current workaround you can use if you're in Linux which is manually starting the engine and use the --add-host flag to be able to connect to the host service. Here's more info about it here #1136128194710356038 message

crimson badger
crimson badger
warm pelican
#

To follow up, I chatted with Marcos and confirmed that his integrated git server idea would indeed be very useful for Unreal Engine. Until that becomes available, I'm making do with a local git server and some network hacks.