#`Host.directory` `filesync` taking 2m56s

1 messages · Page 1 of 1 (latest)

torn tinsel
#

Howdy, I'm fairly new to dagger, thanks in advance for your attention!

As the title indicates the startup time for my codebase is taking way too long; there are a number of large directories being synced (see below for backup). I've been poring over the docs and github issues to understand how I can speed this up, and I've seen posts indicating that .dockerignore should be effective at preventing large folders from being considered in a sync operation. I've also seen that dagger.json keys for include and exclude should be effective. As I'll show below I've added references to those large folders (the usual suspects, .git, .venv, node_modules) in dockerignore and exclude; however, the filesync logs still prominently show those folders:

dagger.json

{
  "name": "core-platform",
  "engineVersion": "v0.20.3",
  "sdk": {
    "source": "python"
  },
  "source": "dagger",
  "include": [
    "backend",
    "frontend",
    "<redacted>",
    "<also_redacted>",
  ],
  "exclude": [
    ".DS_Store",
    ".claude",
    ".git",
    ".venv",
    "bin",
    "**/.venv",
    "**/.git",
    "**/node_modules"
  ]
}

.dockerignore

.venv/
.git
...

console output from dagger function

▼ ✔ parsing command line arguments 2m56s
├╴● $ address(value: "."): Address! 0.0s CACHED
╰╴▼ ✔ .directory: Directory! 2m56s
  ╰╴▼ ✔ Host.directory(path: "."): Directory! 2m56s
    ╰╴▼ ✔ filesync 2m56s
      ├╴● ✔ .DS_Store 0.0s ◆ Written Bytes: 6.1 kB
      ├╴● ✔ .claude 0.0s ◆ Written Bytes: 14 kB          <-- present in `exclude`
      ├╴● ✔ .dockerignore 0.0s ◆ Written Bytes: 754 B
      ├╴● ✔ .env.raw 0.0s ◆ Written Bytes: 2.1 kB
      ├╴● ✔ .git 8.6s ◆ Written Bytes: 680 MB            <-- present in `exclude`
      ├╴● ✔ .gitignore 0.2s ◆ Written Bytes: 1.1 kB
      ├╴● ✔ .netrc 0.2s ◆ Written Bytes: 107 B           <-- present in `.dockerignore`
      ├╴● ✔ .venv 5.2s ◆ Written Bytes: 5 B               <-- present in `exclude`
      ├╴● ✔ .vscode 0.0s ◆ Written Bytes: 1.8 kB
      ├╴● ✔ <redacted> 5.8s ◆ Written Bytes: 20 kB
      ├╴● ✔ backend 40.5s ◆ Written Bytes: 3.4 MB
      ├╴● ✔ bin 36.3s ◆ Written Bytes: 62 MB              <-- present in `exclude`
      ├╴● ✔ compose.override.yaml 35.3s ◆ Written Bytes: 110 B
      ├╴● ✔ dagger 38.7s ◆ Written Bytes: 29 kB
      ├╴● ✔ dagger-develop.log 38.6s ◆ Written Bytes: 5.1 MB
      ├╴● ✔ docker-compose.yml 38.4s ◆ Written Bytes: 159 B
      ├╴● ✔ dagger-develop.log.orig 38.8s ◆ Written Bytes: 15 MB
      ├╴● ✔ docs 55.4s ◆ Written Bytes: 220 kB
      ├╴● ✔ frontend 2m5s ◆ Written Bytes: 551 kB
      ├╴● ✔ justfile 1m58s ◆ Written Bytes: 20 kB
      ├╴● ✔ mise.toml 1m58s ◆ Written Bytes: 279 B
      ├╴● ✔ pyproject.toml 1m58s ◆ Written Bytes: 351 B
      ├╴● ✔ scripts 1m58s ◆ Written Bytes: 13 kB
      ├╴● ✔ <also_redacted> 1m58s ◆ Written Bytes: 37 B
      ├╴● ✔ uv.lock 1m58s ◆ Written Bytes: 173 kB
      ╰╴● ✔ copy 32.6s

I'm running on a fairly recent mac M3 with plenty of headroom on processor, memory, and disk.

Questions:

  • Is dagger.json exclude (or `.dockerignore) still the right way to do this?
  • Do I need to do something for dagger.json to be read into dagger?
  • What am I missing here?
  • Assuming there is no actual, effective way to exclude these files from the context, why does copying less than 1G of data take nearly 3 minutes (accepting that I'm not copying to a USB 2.0 flash drive)?
  • Are these files not cached somewhere? (I've re-run the command many times without altering them (aside from maybe adding something to dagger.json) and it consistently takes longer than 2m30s)
#

According to the docs here "Dagger already uses caching to optimize file uploads". Maybe the problem is between the keyboard and chair.

Learn how to work with directories in Dagger, including copying, mounting, and filtering directories and files.

floral crow
#

exclude is deprecated, you should use include instead and prefix the entries you want to exclude with !

{
  "name": "core-platform",
  "engineVersion": "v0.20.3",
  "sdk": {
    "source": "python"
  },
  "source": "dagger",
  "include": [
    "backend",
    "frontend",
    "<redacted>",
    "<also_redacted>",
    "!.DS_Store",
    "!.claude",
    "!.git",
    "!.venv",
    "!bin",
    "!**/.venv",
    "!**/.git",
    "!**/node_modules"
  ]
}
#

But I think this is not the issue, the thing is the include/exclude in dagger.json doesn't apply to the arguments you are passing to a function

torn tinsel
#

Roger that, I just saw this in the run output: │ ├╴▼ ✔ moduleSource(refString: "."): ModuleSource! 11.0s │ │ ├╴● ✔ parseRefString: . 0.0s │ │ │ │ │ ├╴● $ host: Host! 0.0s CACHED │ │ ├╴▶ ✔ .directory( │ │ │ ┆ path: "/Users/pstiverson/Projects/core-platform-2" │ │ │ ┆ include: ["./dagger.json", "dagger", "backend", "frontend", "alliance_server", "surveillance", "!.DS_Store", "!.claude", "!.git", "!.venv", "!bin", "!**/.venv", "!**/.git", "!**/node_modules"] │ │ │ ┆ gitignore: true

#

(so I suspect dagger is reading dagger.json)

torn tinsel
#

that is interesting...

#

So... when is it effective?

floral crow
#

What's the signature of the method you are calling?

torn tinsel
#
    @function
    async def dev(self, source: dagger.Directory) -> dagger.Service:```
floral crow
#

The include in dagger.json is related to the module source, the source code that represents the module.
Not the argument you are passing to a function.
So here you have to either do pre-call or post-call filtering.
The doc you shared above is the right one
For instance source: Annotated[dagger.Directory, Ignore(["*", "!**/*.py"])]

torn tinsel
#

also ./bin/dagger call backend dev --source=. up --ports 8000:8000

floral crow
#

The filtering doesn't exist on the CLI side, it's to be made inside the code

torn tinsel
#

sweet. I'll try that annotation

#

Thank you!

cold raptor
torn tinsel
#

Amazing! filesync down to 8 seconds.

#

I was about to suggest that the Annotated line be in the docs, but then I scrolled up one screen in the docs and see that line... Thank you for being patient with my non-docs reading self.