#Caching issues when running local tests

1 messages · Page 1 of 1 (latest)

primal egret
#
  1. I was facing some caching issues when working on this ticket: https://github.com/dagger/dagger/pull/5149
  2. TLDR: I ran the TestContainerWithMountedSecretOwner test at some point with a broken implementation, the output expected by the test was cached.
  3. After this I ran ./hack dev bash multiple times, switched branches, did a lot of changes and the test would just keep failing due to the cached output caused by the broken implementation mentioned above.
  4. I renamed the subtest TestContainerWithMountedSecretOwner/userid to TestContainerWithMountedSecretOwner/userid-123 and test userid-123 ran fine!
  5. When contributing to Dagger how do I ensure that the above doesn't happen when running local tests?

To reproduce this try the following:

  1. Pull latest main.
  2. Run ./hack dev bash
  3. Run go test -v -count=1 -timeout 30s -run ^TestContainerWithMountedSecretOwner/userid$ github.com/dagger/dagger/core/integration. Test should pass.
  4. Introduce some breaking behavior, for example a panic in cmd/shim/main.go main's function.
  5. Re-run ./hack dev bash
  6. Run the test again: go test -v -count=1 -timeout 30s -run ^TestContainerWithMountedSecretOwner/userid$ github.com/dagger/dagger/core/integration. Test will still pass.
  7. Rename the test to userid-111.
  8. Run the renamed test: go test -v -count=1 -timeout 30s -run ^TestContainerWithMountedSecretOwner/userid-111$ github.com/dagger/dagger/core/integration. Renamed test will fail due to the panic introduced in step 4.

I would expect the test to fail in step #6, any feedback?

GitHub

Picking up #4944 with some refactoring:
First iteration:

Simplify SecretScrubWriter write logic and only use []byte for matching and replacements. I also explored strings.Replacer and bytes.Replac...

urban thunder
#

Step 6 passes because Buildkit doesn't factor the shim executable into its caching. Dagger configures Buildkit with the shim as its OCI runner, and the shim OCI runner then injects itself into the OCI bundle, so it doubles as the container's init process. All of this is invisible to Buildkit's caching because it happens at a lower layer outside of LLB.

I'm not sure why renaming a test (steps 7-8) will cause it to bust the cache though, unless the test name is passed in to Dagger or something.

primal egret
#
GitHub

A programmable CI/CD engine that runs your pipelines in containers - dagger/container_test.go at main · dagger/dagger

urban thunder
#

oh, in that case the name actually does get passed to Dagger, so that makes sense

#

it's not just test name, it's also used as a filename

primal egret
#

right!

#

do you think this is an isolated case or it would be a nice to factor shim changes for some tests?

urban thunder
#

hm I would put this on the "isolated case" side of the spectrum - we likely won't be changing the shim all that often. if you can find a cheap and easy way to bust the cache when the shim changes, by all means, but it might be fine to just document it in CONTRIBUTING.md or something

primal egret
#

@latent hill the steps above should reproduce a similar issue: run a succesful test, then break shim, the rerun the test, etc. In my case the test cached the wrong result so it failed every time

latent hill
#

Is this behaviour limited to the shim? That was my understanding from what Alex wrote.

primal egret
#

yeah, the secret scrub functionality is part of the shim. also container.go passes secrets parameters to it.

latent hill
#

OK, so the caching issue is only in the case of the shim. A shim gotcha for sure, but not something that many contributors would hit.

#

By the way, I started creating issues for other intermitted test failures while looking for yours. There is another secrets-related one that I have hit with an empty cache on Linux - will link to when it's live (just about to jump on a call)

primal egret
#

@latent hill that’s right!

woven bobcat
#

Yeah right now the shim issue is annoying but obscure, however I just realized that when running our CI on engines w/ persistent cache it will become a bigger issue. If a PR makes shim-only changes, many or most of the tests could end up cached when they should actually re-run.

Made an issue w/ an idea on a fix here: https://github.com/dagger/dagger/issues/5307

GitHub

Right now, changes to the shim do not result in a cache invalidation of the exec, which is extremely confusing when developing. It also could result in tests incorrectly being skipped if run on eng...