#@gerhard @Erik Sipsma @jedevc @matipan
1 messages · Page 1 of 1 (latest)
While I do not think that we should be focusing exclusively on "engine stability", our CI reliability is not currently at a point where we can be confident in the test results. 33% of the time they fail for reasons unrelated to the changes being made.
Here is a bird's eye view of the checks on main - many commits have failing checks (see screenshot).
Since our CI switched to v0.11.8, 10 our of 15 runs passed - 66% success rate. The first of the 5 failures was a genuine mistake, and the remaining 4 are known issues (most recent failure first):
- https://dagger.cloud/dagger/traces/d532cbf0d4596a09abb5bff419645da4?span=68d39b611d888608
- https://dagger.cloud/dagger/traces/cf01292eeec947f662dc6e2931bfd809?span=b7228e0d9c3a046a
- https://dagger.cloud/dagger/traces/216b6b240c3d70cd4c5cd8239828a924#beea3f76ed7711fe:L143 (fixed by https://github.com/dagger/dagger/pull/7717)
We are also seeing genuine intermitted failures in non-Engine test suites, such as https://dagger.cloud/dagger/traces/6ee45731fe7aa860bcf48883866eac21?span=8cea5c72281296b2
The truth is that the team is shifting focus to the v0.12.0 release, and the current state of main no longer allows us to ship a v0.11 patch release. We could create a new release branch, start back porting from main and then release a new patch, but this seems like a lot of work. Most likely, we will end up releasing the latest fixes - such as https://github.com/dagger/dagger/pull/7717 - in v0.12.0
As a change of scenery, we can resume packaging & prod architecture work, and accept that our CI is likely to remain at the current stage - 66% reliable with a run duration of ~20 mins