#Test attributes
1 messages · Page 1 of 1 (latest)
🧵
<@&946480760016207902> for visibility
@thorn wadi would we guarantee that 1 span = 1 test? I guess yes
My request would be: make the path an array. It buys us some options for portability across toolchains
yeah, though it may have sub-tests with their own spans, depends on how fine grained you want to re-run
["TestFoo", "bar", "baz"]
(so maybe my answer is actually 'no')
That's where the best practices diverge between 1) test splitting, and 2) human/LLM devloop
- Test splitting -> less granular. Select package or file
- Devloop -> more granular. Select single test, or a few tests, or a weird regexp...
What if we could automate --fail-fast? Once we see a failed test span, we just kill the whole function?
That would be pretty sweet. Then we dont have to pass go test -failfast or other test runner equivalent if one even exists
Is dagger test a subset of dagger check or a divergence
I think it's fair to assume tests are always beneath a check, so I split it out because it seems like we'd need a way to run only certain tests beneath a check, and dagger check doesn't let you do that easily since it's more about running a bunch of checks at once (go:*, or dagger check foo:bar fizz:buzz) so it seems confusing how to express 'run all these checks but only these tests beneath these checks' (which checks get the filter?)
but this is totally made up on the spot
Yeah it should stay under check IMO
right just making sure I'm thinking what you're thinking. Makes perfect sense
We will still need a way to bridge with the check interface
Otherwise it will get messy and fragmented (like a shadow checks API)
yeah, i don't like having test and check but it seems hard to do it all in one command, with how check is 'many checks'
'run all these checks but only these tests beneath these checks' (which checks get the filter?)
We could just continue the path.
go:foo:bar:baz
unless we just pass the filter to all the checks and whatever happens, happens 
It wouldn't be a separate dimention of filtering
That woulld cause lots of problems IMO
so test-split:test-telemetry:TestTelemetry/TestGolden/pending? seems reasonable
i do think it's imperative that we don't butcher that into test-split:test-telemetry:TestTelemetry:TestGolden:pending
if i have to do a translate after copy-pasting from CI it'll be annoying
i guess counter-argument is maybe i can copy-paste strings that we generate, instead of go test output, but still seems like swimming against the current
I'm skeptical of how imperative that is
well, like. i ran tests, they failed, now i'm angry, now i have to do string manipulation to re-run it locally, now i'm more angry
But we're missing a major piece of the puzzle which is dynamic checks
what if dagger check test-split:test-telemetry:TestTelemetry:TestGolden:pending worked the same as dagger check test-split:test-telemetry --filter TestTelemetry/TestGolden/pending
But how likely are you to re-run them with Dagger rather than go test?
And how much are willing to complicate things to allow you to do so with a copy-paste
well, it's 100% right now, until the testcontainers stuff lands 😛
Yeah but that's us, we're a special case
I meant the other "you"
We already have an escape hatch dagger call, do we really need to taint check with basically the same escape hatch behavior?
The whole point of check is to be a more opinionated interface so we can build more things on top of check that we can't on top of the less standardized call. If we turn check into call, might as well not have check at all (playing devil's advocate here)
I was thinking more like what is the minimum that check needs to not rely on the call escape hatch
i'm confused how treating the fully qualified test name as a black box is more complicated than splitting it on boundaries, turning it into a hierarchy, and dealing with that in the various test frameworks 
TBH what you call to run a test isn't that important, but what is important is being able to tell the user/LLM "hey here's the command you run to re-run these individual tests"
it adds a new dimension to checks, which we have to reconcile with the other dimension (check hierarchy vs. test filters in checks), and the new dimension is module-specific, so we also have to manage inconsistencies between the test filters depending on the check.
I get why we need that, I know check is too limiting at the moment. But this feels like an escape hatch, and we already have call for that
For flake mangement & test splitting, we need to make sure tests can be filtered / bucketed automatically by our own tooling - without participation by the end user or its native tooling. I'm a little paranoid about accidentally crossing a door that makes that harder
For those features ☝️ a go-specific --filter flag is basically a black box. It's why we can't just rely on dagger call for those features. I don't want an escape hatch in check to create the same problem there too
this is totally based on vibes, but if i'm told to use dagger check to run tests, but then i have to reach for an 'escape hatch' for the case where a test fails and i want to re-run it, it makes me question the value of check a bit, to the point where the escape hatch will just be the thing i use all the time, only harder to discover now
My fear/paranoia is exacerbated by the incomplete information, eg. dynamic checks not being implemented yet, workspace API only partially dogfooded yet, so there's additional uncertainty on how exactly dagger check might look
Yeah I agree with that.
i think there's a reasonable subset of cross-cutting functionality that could apply to a lot of checks, and filtering + fail-fast covers a lot of it, the rest being '--parallel' and MAYBE '--count=1000' - there's value to me in unifying that across test runners
pretty low conviction on the last part, though, those are getting more bespoke for sure
But it doesn't help us ship test splitting or flake management, and possibly makes it harder (but hard to know for sure, just adds design constraints in advance basically, and commits us to a direction before we know what those features need exactly)
I have several attempts in flight to fit checks with filtering & splitting. Currently paused while we move the first parts of workspace, to get more clarity. This would basically ignore & deprecate all that. Actually doesn't deprecate, because it doesn't solve all the same problems. Deprecating would be fine actually.
why wouldn't it help with flake management? there'd be a unified way to identify which things failed and re-run them with a tiny delta from your original call, which you don't have if call is your only escape hatch
for test splitting, if there was a way to detect filters ahead of time without running tests, you would just need something that lists those same filters, but maybe broader (just TestContainer instead of TestContainer/...), and passes them in the same way (high chance i'm oversimplifying, but it makes sense in my head)
How would filters be passed? An argument to all functions?
I would feel better if you at least read the various attempts I already made at answering that question within the check system
did i not?
i remember having various convos about tests/checks and agreeing that they're different dimensions at one point, but don't know if that ended up in a doc
feels like the dumb answer is 'agree on an arg name', and the smart answer is 'figure out an arg name with extra steps'
but you need to map which args go to which check. Or, just blast the arg to all checks which feels wrong to me
Trying to remember where the most up-to-date notes are... Found this https://gist.github.com/shykes/5d70658b90cdfb00e426d354f6ab6d8d
Collaborative Parallel Execution in Dagger - Design Discussion - dagger-parallel-execution-design.md
Didn't expect this part of the design to be pushed through so early, so I'm kind of scrambling
i agree passing it to all checks is wrong, which is why i started with dagger test <one-check> [filters...] as a strawman, and then proposed some:check-name:TheRest/GetsPassedTo/SomeCheckName
(I had missed the dagger test proposal)
re-reading now in the context of workspaces, running tests for a directory seems like a good fit for artifacts right? that would move one dimension of filtering elsewhere at least
Maybe I should focus on moving dyamic checks & provenance along (workspace part 3) since it impacts the check syntax, so the sooner we have visibility into that, the sooner it can inform your exploration of tests
For example it kind of deprecates the foo:bar check notation (although I was hoping to keep backwards compat with it)
--> why bother filtering by function path when you can filter by actual file path + possibly module name
yeah, seems like we need to defer this a bit more. would be nice to get the UX win from having the span attribute though
eg. dagger artifacts --path=./myapp --type=go.TestPackage
yeah I have zero objection to shipping that part asap 🙂