#Caching subtools (pip, pants) and dagger... filesystem read seems quite slow

1 messages · Page 1 of 1 (latest)

tough island
#

I'm trying to marshal a fairly large python monorepo through dagger. To speed subsequent builds/steps in a container, I would ordinarily mount the pip and pants caches into my container so that the build could take advantage of that.

Unfortunately, those caches can also be very big (gigabytes or more). When mounting those into dagger actions (docker.#Run: mounts:) it looks like I have to do a client:filesystem:label:read, but the read itself takes ... ages with those big caches.

Is there a better way to mount things up or handle caching like this? I'm not even convinced the caching is working to be honest... not sure what I'm missing with that.

novel field
#

in cases like this it's usually faster to set up a persistent cache mount and let the container-side warm the cache, rather than transfer the cache in from the host. there'll be an initial cost to warm the cache, but it should be much faster after that.

I'm not sure how to do this with Cue though otherwise I'd post a snippet. 😅

one possible tweak is that the new withMountedCache API actually supports configuring an initial directory to seed the cache from. @hardy aspen do you know if that's exposed in the Cue SDK?

https://github.com/dagger/dagger/blob/ab7fba6ce6611a5eb0c279f8db42bb45b1cca11f/core/schema/container.graphqls#L90

if so, you could possibly pass the host path there, and it'll (hopefully) do one last sync. with that change you wouldn't need the container-side to warm it

tough island
#

Hmm. I do see #CacheDir in core.exec, which claims to be a "best effort" persistent cache dir; if that's handled through buildkit that could be what's intended as the way to go. Let me try that.

novel field
#

yeah that's the one!

tough island
#

Hmm; maybe (the command ran) but mounting to ~/.cache.pip/pants didn't seem to do the trick... I don't think (at least repeated builds didn't act cached). Not sure how to inspect buildkit cache, so if you have any pointers I'll give that a go; otherwise at least thanks for the start!

novel field
#

shot in the dark: it might not be expanding the ~ in that path, if that's the literal string you gave. you could try /home/<user> or /root? (depending on the user the container runs as)

hardy aspen
# novel field in cases like this it's usually faster to set up a persistent cache mount and le...

#CacheDir seeds from Scratch: https://github.com/dagger/dagger/blob/d7cf919d971bf856f89977706b6e2b3037cd8fed/plan/task/exec.go#L374-L379

So... I don't think it's doing what you're describing, unfortunately, @novel field.

I don't think there is an easy to way to accomplish this with the current CUE API (Europa).

GitHub

A programmable CI/CD engine that runs your pipelines in containers - dagger/exec.go at d7cf919d971bf856f89977706b6e2b3037cd8fed · dagger/dagger

tough island
#

Actually, it's interesting--the pip cache now works when mounted at /root/.cache/pip but strangely the pants cache doesn't appear to be working quite as I expect; when I rerun tests, the little environments/pexes aren't rebuilt (so some caching is working?) but it goes direct to running the tests... but the test results aren't cached. Hmm, not sure what I'm missing yet.

But that does seem to be a definite improvement; thanks!

#

Parallelism isn't quite doing as expected (running four test jobs in parallel using dagger results in some sort of collision, maybe with the mounted filesystems which are identical in the four jobs). But even when doing so, my multicore machine doesn't light up as it would if using pants to run the tests outside dagger.

Still, unit tests run in dagger; that's a big improvement. Thanks!

#

Neever mind the cache misses; I was running the tests in debug mode so they were forcing reevaluation. Looks great actually.

compact field
#

I'm facing almost this same issue, but using poetry and cant figure out how to improve caching for poetry install

#

@tough island can you share your config? I'd love to see how you connected the cache

tough island
#

Sorry for the delay; what I'm doing is

    test: docker.#Run & {
        input:   copy.output
        mounts: {
            pip: {
                dest: "/root/.cache/pip"
                contents: core.#CacheDir & {
                    id: "pip"
                }
            }
            pants: {
                dest: "/root/.cache/pants"
                contents: core.#CacheDir & {
                    id: "pants"
                }
            }
        }
        workdir: my-workdir
        env: my-env
        command: {
            name: my-command
            args: my-args
        }
    }

This seems to work for me. I'm not honestly sure if caching poetry's venvs would be helpful (pip will certainly help I think?) but you could try adding /root/.cache/pypoetry as a #CacheDir and see if it helps...