#Directory upload

1 messages ยท Page 1 of 1 (latest)

leaden talon
#

Are you using dag.Host().Directory() in your client? And if so, are you using the include and exclude optional arguments?

cursive kernel
#

yes, it's a large git repo, we are including .git because we want to run git commands and LFS

#

the binary we are trying to build is almost 400MB (mostly embedded files)

#

for some additional context, the .git directory is almost 400MB before any LFS pulls happen

leaden talon
#

Mmm, so everything that gets uploaded is actually needed - it's just uploading slowly. Right?

#

One option is to clone the repo from inside Dagger. That way you'd benefit from caching at the git level, and bypass local filesystem upload altogether. I think that has worked well for others in the past.

cc @glad mesa @thorn minnow who have messed with this I believe

cursive kernel
#

We need to be able to build & test in the container, before making a commit

leaden talon
#

Is it only the first run that has a very slow upload? Are the subsequent runs faster?

thorn minnow
#

.git of 400mb is quite a bit. I guess it has a big history, do you need all of it? You could do a fetch of the git repo using a depth of 1 to reduce the amount of history loaded. Are you using Go's SDK? I currently did an integration for our internal monorepo and using go-git I do a PlainOpen with a depth of 1, since I don't really care for the entire history

glad mesa
cursive kernel
#

This is adding a lot of work on our end that a docker buildx would not seem to need. Part of the goal of switching to Dagger Go SDK is that we could remove a lot of custom build tools.

leaden talon
#

@cursive kernel this should be exactly the same performance profile as docker build, since it's the same tech under the hood

#

If it's not, then it's fixable ๐Ÿ™‚

cursive kernel
#

yes, but with docker buildx we would keep much of our internal tools and keep git-lfs work outside of the buildkit engine

#

we are currently including the full repo, because we could benefit from the caching across 30+ services

#

but as we try to roll things out

  1. we run dagger per service
  2. we only do the go build step of the service
leaden talon
#

You mentioned you can't clone the repo from inside the container, because the change you want to test is not yet committed - meaning this is a pipeline you want to run pre-commit on the dev machine, correct?

cursive kernel
#

This is where they see docker buildx as a better alternative, because it wouldn't have the same overhead

leaden talon
#

Well, it's not really buildx that's faster, it's just keeping lfs outside the container that is faster. So you can always keep that part, and swap out buildx for an equivalent Dagger call. Then work incrementally from there.

cursive kernel
#

An "unchanged" rerun has a 12s upload time, probably because the 400MB binary is there

#

we currently have LFS outside of dagger, I think it is actually faster inside, because the upload phase and caching

leaden talon
#

Right. What if you only uploaded the repo without the LFS assets, then fetched LFS objects from inside the container?

#

I've been experimenting with userland caching of git objects with my supergit module (since core buildkit doesn't support lfs)

cursive kernel
#

we still have .git directories on the dev vms that are 10GB+

leaden talon
cursive kernel
#

doesn't matter, they already exists on the dev vms in their current state and we cannot ask them to all wipe their setups

#

they have other, non-dagger builds that are going to fill it in

#

This might come down to "why are buildkit filesytem operations so slow"?

leaden talon
#

You're uploading 10GB.. I'm sure buildkit upload can be made faster, but I don't think that's the bottleneck in this case.

#

Maybe this would work:

  • Upload the git worktree without .git. So you get the uncommitted changes.
  • Get the git objects from inside the container (bypassing direct upload). Make sure to use a cache volume to keep everything fast
  • Combine uploaded worktree with in-container .git: everything should work
glad mesa
#

Upload the git worktree without .git. So you get the uncommitted changes.

Another optimization here could be omit all git LFS files and check them out in the pipeline as well in a cache volume. That way you avoid uploading huge file(s) to the build context often

cursive kernel
#

That involves more effort than normal, because it would require porting some internal build tools in python/cue. We're still at the POC phase

#

I grant that this POC is exacerbating the issues

#

but it's causing 1m builds to take more than 30m

leaden talon
cursive kernel
#

now, I have a second code path, that does much of what you describe here (also doing the 30+ services in a single docker run), and takes about the same time, for a fresh clone

glad mesa
cursive kernel
#

re: elaborate - we use python + cue to form our software catalog and CI entrypoints. The LFS patterns are in CUE and LFS is run by python

#

A docker buildx POC might very well see some of the same issues, but it is a smaller POC than rolling out more Dagger integration before we know

#

at the same time, it would be designed differently, because we cannot realize the full product build (30+ services) in a single command

#

On Jenkins, we see a 50-100% build time increase for this POC, which is currently acceptable. We go from 30-45m to 45-60m

#

The main problem is the local developer experience, where just 1 of the services takes 30m

glad mesa
cursive kernel
#

I'm getting confirmation on 2nd+ uploads, but they still seem to be significant

#

My timings (with a mostly clean repo)
fresh build -> 9m
no change -> 1.5m
change help string -> 5m

#

(for full run, not upload)

#

I have about 1.5GB (.git + lfs + src)

glad mesa
cursive kernel
#

can we get a private space going?

#

we can use the hof one

glad mesa
#

sure, happy to keep chatting there

glad mesa
# cursive kernel My timings (with a mostly clean repo) fresh build -> 9m no change -> 1.5m change...

jumped into a quick call with Tony to go through this. Managed to reduce his 1.5m no change build to 12s. Reason was some Exports he was doing on each build which was generating some context re-uploads and subsequent cache invalidations.

Having said that, it'd be great to have a way to benchmark how much time Dagger is taking to checksum the build context and decide if/what should be uploaded. cc @silent oasis @torpid vigil in case you know if there's any way we could easily know this..

cc @leaden talon

silent oasis
#

(beware, many technical details)

leaden talon
cursive kernel
#

If I do subset := code.Directory(".", { include: ["src/libs/go", "src/services/foo"]})

Is that a copy or is something smarter going on?

glad mesa
silent oasis
#

ah ok sorry i should check back in when i'm a little more awake ๐Ÿ˜„

#

out of curiosity - how much of the commit history etc, is actually relevant here @cursive kernel?

cursive kernel
#

These are local builds on the developer cloud vms, so we are dealing with whatever they have there, plus we need to operate on local changes that are not in git

#

separately, they sometimes have 10s of GB that we cannot predict, one had a ./scratch directory with 30GB

#

I'm reproducing myself right now (added 6x ~5GB tar.gz files in a ./tmp dir), prelim timing indicates it takes ~20m even with the no-change that resulted in 12s above

cursive kernel
cursive kernel
#

I'm working on some "hacks" to deal with my filesize / upload issue

#

Having a dagger option / filter like this could be useful

#

either way, I imagine whatever we come up with could go into a knowledge base

glad mesa
cursive kernel
#

yeah, the hard part is the size to filter on...

#

I'm not sure if we are going to have a universal value, might depend on the directory...

#

are Exclude paths relative to the dir or fullpath?

    git := dc.Dagger.Host().Directory(".git", dagger.HostDirectoryOpts{
        Exclude: []string{"lfs/"},
    })
#

should this be lfs/ or .git/lfs?

#

another thing that has been mentioned before is supporting .gitignore, that would probably cover almost all of my cases besides LFS stuff

#

Here's the gist of what looks to be working so far

func loadSource(dc *client.Client) (*dagger.Directory, error) {
    // get code
    // we do this in parts and then combine, to improve caching

    // first, we need to build up an exclude list that has our exported files
    excludes := []string{
        ".git",
        "src/stacks/foo/data/", // docker compose & root owned directory
    }
    for _, c := range components.QtpComponents {
        // exclude the bin
        p := filepath.Join(c.Dir, c.Bin)
        excludes = append(excludes, p)

        // other possible excludes would be LFS, ...?
    }

    // next, add large files to the exclude list
    // get a list of files over a certain size (hacky, depends on directory?) (like python models or other test files) would prefer (configurable) .gitignore support here
    out, err := exec.Command("sh", "-c", "find . -size +20M").CombinedOutput()
    if err != nil {
        return nil, err
    }
    s := strings.TrimSpace(string(out))
    lines := strings.Split(s, "\n")
    for _, line := range lines {
        line = strings.TrimSpace(line)
        // we can ignore git, since we ignore the entire directory anyway
        if strings.HasPrefix(line, "./.git") {
            continue
        }
        line = strings.TrimPrefix(line, "./")
        excludes = append(excludes, line)
    }

    // then upload the source without git & exports
    src := dc.Dagger.Host().Directory(".", dagger.HostDirectoryOpts{
        Exclude: excludes,
    })

    // get the git directory separately, because it is large and changes less often
    git := dc.Dagger.Host().Directory(".git", dagger.HostDirectoryOpts{
        Exclude: []string{"lfs/"},
    })

    final := src.WithDirectory(".git", git)

    return final, nil
}