#Sometimes, functions that didn't have modification in any of the layers do not use cache. Why?

1 messages · Page 1 of 1 (latest)

tawny shuttle
#

We're using the TypeScript version, and we can see that there are steps being performed while setting up the module that can affect the files in the repo. Is it related? Should we care to exclude .dagger and .dagger.json when copying the root directory?

tawny shuttle
#

Should we care to exclude .dagger and .dagger.json when copying the root directory?

That didn't help. Still randomly cache is not used 🤷🏻‍♂️

tawny shuttle
odd crane
#

you can press w from the tui while dagger is running

#

(for "web")

tawny shuttle
#

I'm using it. But it would be helpful to get a warning or something because one doesn't expect that a core behavior of the function changes depending on how the same input value is passed to the function (either via "default" or explicitly).

odd crane
#

it provides doesn't

#

chances are you have an actual input changed, perhaps an unrelated file that is not filtered out

tawny shuttle
#

So maybe I'm misreading this?

When relying on default paths in Dagger Shell, it's important to know that the source file or directory is re-evaluated on each command execution within the shell session. This differs from passing the source explicitly as an argument, where it's evaluated once and cached. This re-evaluation can lead to unintended behavior where changes to the source directory during the session (such as through exports or logs) invalidate the cache, causing the entire pipeline to re-execute.

odd crane
#

I'm not sure what that sentence is saying..

#

Either way the most likely root cause remains unrelated files (like exports or logs mentioned in that paragraph), that get included in your input directories. You can use pre-call filtering (eg. +ignore pragma in Go) to filter out noise

tawny shuttle
#

I'll create a small repro

twin jetty
odd crane
#

Ah I see

#

I think that's unrelated to your issue @tawny shuttle . I would focus on finding the input directory that gets invalidated (using dagger cloud to find non-cached operations upstream of the one being invalidated), then using pre-call filtering to remove the noise

tawny shuttle
tawny shuttle
#

Ok, maybe there's a design flaw in my function pipeline. I'm setting up a CI for a monorepo and I've defined granular CI functions per package, in order to make use of cache. There's a base function that creates a base container (e.g. node:22.17) and install dependencies etc. Most of the CI functions have actions that require monorepo dependencies. Here's how it looks like:

#
@object()
export class MyDaggerStuff {
  private source: Directory;

  constructor(
    @argument({ defaultPath: ".", ignore: [".git/**", "**/node_modules/**", "**/coverage/**", "**/dist/**"] })
    source: Directory
  ) {
    this.source = source;
  }

// ...

  /**
   * Base container with source code and dependencies installed
   */
  @func()
  async base(): Promise<Container> {
    return dag
      .container()
      .from("node:22.17")
      .withWorkdir("/app")
      .withDirectory("/app", this.source, {
        include: [
          // minimal set of files to install dependencies across monorepo packages
        ],
      })
      .withExec(["corepack", "enable"])
      .withExec(["pnpm", "install", "--frozen-lockfile"])
      .sync();
  }

  /**
   * Function that doesn't require any monorepo dependency, only the package itself
   */
  @func()
  async formatMyPackage(): Promise<string> {
    const base = await this.base();
    const container = base
      .withDirectory("/app", this.source, { include: ["my-package/**"] })
      .withExec(["pnpm", "--filter", "@packages/my-package", "format"]);
    await container.sync();
    return "@packages/my-package: passed";
  }

  /**
   * Function that require a monorepo dependency
   */
  @func()
  async lintMyPackage(): Promise<string> {
    const container = await this.withMyPackageDeps();
    await container.withExec(["pnpm", "--filter", "@packages/my-package", "lint"]).sync();
    return "@packages/my-package: passed";
  }

  private async withMyPackageDeps(): Promise<Container> {
    const base = await this.base();
    return base
      .withDirectory("/app", this.source, { include: ["my-package/**"] })
      .withDirectory("/app", this.source, { include: ["my-package-dependency/**"] })
  }
}
#

So I thought that the cache layers would depend on the withDirectory calls, but it seems like a different hash is produced for the source whenever any file changed inside the . directory or its subdirectories changes, unless they're listed in the ignore.

#

I'm wondering if working with the withFiles / withoutFiles is the proper way to accomplish what I want here.

tawny shuttle
#

If source invalidates the actions that depend on it (all the actions that need files?), does this mean the only way I can get this granular cache behavior I'm trying to achieve on each of the function executions is by having a single dagger call per function with a specific source parameter defined in the source call? 🤔

odd crane
#

cc @wicked scaffold 👆

tawny shuttle
wicked scaffold
#

one pattern that I implement and I noticed it improved the cache hit was to use defaultPath on every entrypoint function and that improved the hit.

Here's an example for a ci in go:

func (d *DagbenchCi) Build(
    ctx context.Context,

    //+ignore=["**", "!**/*.go", "!go.mod", "!go.sum", ".dagger/"]
    //+defaultPath="/"
    source *dagger.Directory,

    //+optional
    platform dagger.Platform,
) (_ *dagger.File, err error) {
    if platform == "" {
        platform, err = dag.DefaultPlatform(ctx)
        if err != nil {
            return nil, err
        }
    }

    return dag.
        Go(source).
        Build(dagger.GoBuildOpts{
            Platform: platform,
        }).File("bin/dagbench.io"), nil
}

func (d *DagbenchCi) Lint(
    ctx context.Context,

    //+ignore=["**", "!**/*.go", "!go.mod", "!go.sum", "!.golangci.yml", ".dagger/"]
    //+defaultPath="/"
    source *dagger.Directory,
) (string, error) {
    return dag.
        Container().
        From("golangci/golangci-lint:v2.5-alpine").
        WithDirectory("/app", source).
        WithWorkdir("/app").
        WithExec([]string{"golangci-lint", "run"}).
        Stdout(ctx)
}

It involves a bit of code deduplication but it allows better cache hit because I can specify different pre-filter ignore for every functions

tawny shuttle
#

So, as of now, the best we can do is atomic dagger call so we can use the defaultPath / ignore instructions to narrow down the caching right?

wicked scaffold
#

I think so yes, until we complete 10367

#

But the DX should stay quite simple thanks to defaultPath, you don't need to have 10 different flag

#

You can also try to group function based on the common source

tawny shuttle
#

It's just the orchestration layer that becomes a bit complex.

odd crane
#

@wicked scaffold can you give the example in TS?

odd crane
#

Each with a specialized job

#

That way you get your cake (better cache granularity) and eat it too (less repetitive)

wicked scaffold
#
export class Example {
  @func()
  build(
    @argument({ defaultPath: "/", ignore: ["**", "..."]})
    source: Directory,

    platform?: Platform
  ): File {
    ...
  }

  @func()
  lint(
    @argument({ defaultPath: "/", ignore: ["**", "..."]})
    source: Directory
  ): Promise<string> {
   ...
  }
}
wicked scaffold
#

That would looks like:

export class Example {
  repo: Directory

  project: Directory

  sourceCode: Directory

  dependencies: Directory

  constructor(
    @argument({ defaultPath: "/", ignore: [".git"]})
    repo: Directory

    @argument({ defaultPath: "/", ignore: ["**", "!**/*.ts", "tsconfig.json", "package.json", "..."]})
    project

    ...
  ) { ... }

  @func()
  lint(...) {
    // use this.depdendencies, this.project etc..
  }
}
odd crane
#

Note for later: we should really update the boilerplate in dagger init to illustrate things like that better