II have an unusual scenario, as I'm working with Unreal Engine. Unreal is a monorepo, and it's a huge piece of software. After checking out the UE repo, running their proprietary Git-LFS alternative, and creating a single-platform build, the workspace grows to 10GB.
I want my Dagger pipeline to support clean CI checkouts and developer workspace without copying multiple GB unnecessarily.
First I tried using Directory includes + excludes. But Dagger uploads the entire workspace to the engine before applying include/exclude.
Then I tried using a virtual directory and withDirectorying a few select folders to it. This avoided upload of .git and Intermediates and other such folders, which was good. But it included the LFS files, which was not. Ideally, Dagger only hashes the gitdeps.xml file and the actual downloading or copying of the binary files referenced by that manifest occurs in the Dockerfile build process.
Then I tried using a virtual directory and withFileing the repo contents to it using the output of git ls-tree. This avoided upload of LFS files to the Dagger dir, but took way too long as each withFile has significant overhead and there are 148K tracked files.
Finally, I tried running git daemon on the host, on the workspace's .git dir, and used git(...).commit(...).tree() instead of directory(...). This was lightning fast - less than one minute (compared to over 10 for other approaches). Success! The only issue is, I could only get it working with the host's LAN address. git("git://localhost") and git("git://host.docker.internal") did not result in a successful connection to the daemon.
So, my question ultimately is: is there a better way to network with the host than querying its LAN IP from the script? Or is there a better approach to dealing with Unreal's girth that I haven't considered? I have noticed a few questions about bind-mounts around here and the answer tends to be "don't do it, use directory".