#Gitlab matrix jobs incorrect cache hits

1 messages · Page 1 of 1 (latest)

tidal kettle
#

I've had this issue a number of times, it's incredibly dangerous - jobs are succeeding on cache hits that shouldn't be happening.

  - component: gitlab.com/path/to/component/that/calls/dagger/module
    inputs:
      name: amzl-laas-21-jre
      build_args: ["JAVA_MAJOR=21"]
      dagger_progress: plain
  - component: gitlab.com/path/to/component/that/calls/dagger/module
    inputs:
      name: amzl-laas-25-jre
      build_args: ["JAVA_MAJOR=25"]
      dagger_progress: plain

Note the version differences in both name and build args. These are fed to a Docker builder module. The build method accepts the build args as a parameter named args and has +cache=session (which should be two different client sessions). build function signature:

func (m Docker) Build(
    ...
    // A list of build arguments in the format of arg=value
    // +optional
    args []string,
  • These includes produce two separate Gitlab jobs that run at the same time (image attached).
  • Those two jobs connect to the same Dagger engine (remotely hosted)

Result = two jobs, both the same:

  1. https://dagger.cloud/mjb/traces/68112b970df043aae2864e1622810d03
  2. https://dagger.cloud/mjb/traces/2334c0e99ac98af4ab3d9b1341f2f5b9?listen=b5da4f3b50a00859

I ran dagger core engine local-cache prune on the remote runner with pointing at the engine exposed by the remote host (e.g. localhost:port) and ran again, same problem:

  1. https://dagger.cloud/mjb/traces/7c89ca3d87bc56112c5339468345d9fe
  2. https://dagger.cloud/mjb/traces/ffb7cca66bb45fe9b21dc045767bd6ab

From job logs:

12  : Docker.build(
12  : ┆ source: Address.directory: Directory!
12  : ┆ file: "Dockerfile"
12  : ┆ args: ["JAVA_MAJOR=25"]
12  : ┆ platform: ["linux/amd64", "linux/arm64"]
12  : ): Docker!
18  : Docker.build(
18  : ┆ source: Address.directory: Directory!
18  : ┆ file: "Dockerfile"
18  : ┆ args: ["JAVA_MAJOR=25"]
18  : ┆ platform: ["linux/amd64", "linux/arm64"]
18  : ): Docker!
#

There's a very obvious behaviour on the Gitlab runner job logs when this happens. One of the two jobs runs on and builds, the other hovers not showing any output for a long time then when it does print output it's a load of CACHED.

#

(I don't think +cache="session" matters here, the inputs are different and this job should not be getting cache hits at all.)

#

@bronze rampart You've helped with caching issues in the past, is this something you're familiar with?

tidal kettle
#

Testing this further:

  1. Zero instances of this behaviour locally. Run java 21 then java 25, then java 21, all jobs are correct, Dagger output is correct.
  2. Added a third Gitlab component of this module with version 22, same issue.
  3. Run a new pipeline, cancel all jobs, run them one at a time. Result: Java version 21 job has java version 22 in it (attached image) 😕
  4. +cache=never doesn't change anything, same issue. Functions are running whether never or session, there's something going on with Directory.DockerBuild though.
  5. Added resource_group (https://docs.gitlab.com/ci/yaml/#resource_group) to make all jobs run one by one, same issue.
#

Local outputs:

host: Host!
.directory(path: "."): Directory!
.dockerBuild(dockerfile: "Dockerfile", platform: "linux/arm64", buildArgs: [{name: "JAVA_MAJOR", value: "21"}]): Container!

dagger / $ java --version
openjdk 21.0.10 2026-01-20 LTS
OpenJDK Runtime Environment Corretto-21.0.10.7.1 (build 21.0.10+7-LTS)
OpenJDK 64-Bit Server VM Corretto-21.0.10.7.1 (build 21.0.10+7-LTS, mixed mode, sharing)
dagger / $ arch
aarch64
host: Host!
.directory(path: "."): Directory!
.dockerBuild(dockerfile: "Dockerfile", platform: "linux/amd64", buildArgs: [{name: "JAVA_MAJOR", value: "21"}]): Container!

dagger / $ java --version
openjdk 21.0.10 2026-01-20 LTS
OpenJDK Runtime Environment Corretto-21.0.10.7.1 (build 21.0.10+7-LTS)
OpenJDK 64-Bit Server VM Corretto-21.0.10.7.1 (build 21.0.10+7-LTS, mixed mode, sharing)
dagger / $ arch
x86_64
host: Host!
.directory(path: "."): Directory!
.dockerBuild(dockerfile: "Dockerfile", platform: "linux/amd64", buildArgs: [{name: "JAVA_MAJOR", value: "25"}]): Container!

dagger / $ java --version
openjdk 25.0.2 2026-01-20 LTS
OpenJDK Runtime Environment Corretto-25.0.2.10.1 (build 25.0.2+10-LTS)
OpenJDK 64-Bit Server VM Corretto-25.0.2.10.1 (build 25.0.2+10-LTS, mixed mode, sharing)
dagger / $ arch
x86_64
host: Host!
.directory(path: "."): Directory!
.dockerBuild(dockerfile: "Dockerfile", platform: "linux/arm64", buildArgs: [{name: "JAVA_MAJOR", value: "25"}]): Container!

dagger / $ java --version
openjdk 25.0.2 2026-01-20 LTS
OpenJDK Runtime Environment Corretto-25.0.2.10.1 (build 25.0.2+10-LTS)
OpenJDK 64-Bit Server VM Corretto-25.0.2.10.1 (build 25.0.2+10-LTS, mixed mode, sharing)
dagger / $ arch
aarch64
#

So summary:

  1. Sending jobs one by one to local engine: no issue
  2. Sending jobs one by one to remote engine: reproducible every time