#Opentelemetry exporter failing when dagger command fails

1 messages · Page 1 of 1 (latest)

compact crystal
#

Hello,

We are running in the same error as the one opened yesterday on github on a demo project
https://github.com/dagger/dagger/issues/13008

To add a bit more context :
When a dagger command is failing (a normal failure, the command is running tests, and the tests are failing), we end up having this weird opentelemetry failure.

This happens whatever the dagger version (I tried downgrading from 0.20.6 to 0.19.6, and the opentelemetry issue is consistent).
This is weird, because we have dagger 0.19.6 in prod, and we don't have this error.

You can easily reproduce this by cloning our conference repo => https://github.com/vmaleze/dagger-devoxx-2026

Uncomment the kotlin-app/src/test/kotlin/com/example/FailingTest.kt and run dagger --progress dots -m dagger-kotlin call test --source ./kotlin-app export --path ./build/test-results
You will see the opentelemtry logs once the gradle test command fails.

If you have any inputs, our conference is on wednesday :p It would be nice to not have this behavior in our demo :p

GitHub

What happened? What did you expect to happen? My Build is showing this: WARNING [opentelemetry.exporter.otlp.proto.http.metric_exporter]: Transient error Internal Server Error encountered while exp...

GitHub

[Devoxx-2026] Dagger à l’échelle : comment éviter de se poignarder avec ses pipelines - vmaleze/dagger-devoxx-2026

compact crystal
#

For what it's worth, here's an analysis made by claude

Here's the precise sequence:

  • with_exec(["./gradlew", "test"]) is lazy — no error yet
  • _extract_test_report calls container.directory("/app").glob(...), which forces Dagger to actually execute the container
  • Since ./gradlew test exited with code 1, Dagger raises dagger.ExecError
  • The Python module process crashes with an unhandled exception
  • The OTel exporter (embedded in the Dagger Python SDK) never gets a chance to flush its buffer cleanly → retry/timeout warnings
ancient sage
#

I'll have a look 👍
(sadly I'm not at devoxxfr this year, I would have liked to attend your talk 🙂 )

#

@compact crystal what is the expected behavior when a test is failing? Do you still want to extract the test reports?

    @function
    async def test(
        self,
        source: Annotated[dagger.Directory, Ignore(SOURCE_IGNORE), Doc("Source directory of the kotlin app")],
    ) -> dagger.Directory:
        """Run the test suite and return JUnit XML reports."""
        container = self._gradle(source).with_exec(["./gradlew", "test"])
        return await _extract_test_report(container)

If with_exec(["./gradlew", "test"]) fails, the function will end here. And _extract_test_report will never run.
If you still want to retrieve and extract the test reports (I supposed it's the case because you run test ... export --path ./build/test-results) you need to not fail the function if the with_exec fails.
You can do that by adding expect=dagger.ReturnType.ANY, meaning all return types are valid.
And then the function ends correctly, generates the test reports and export them to the host.

#
    @function
    async def test(
        source: Annotated[
            dagger.Directory,
            Ignore(SOURCE_IGNORE),
            Doc("Source directory of the kotlin app"),
        ],
    ) -> dagger.Directory:
        """Run the test suite and return JUnit XML reports."""
        container = self._gradle(source).with_exec(
            ["./gradlew", "test"], expect=dagger.ReturnType.ANY
        )
        return await _extract_test_report(container)
$ dagger --progress=dots -m dagger-kotlin call test --source ./kotlin-app export --path ./build/test-results
▶ ...........

.....

> Task :checkKotlinGradlePluginConfigurationErrors SKIPPED
> Task :processResources NO-SOURCE
> Task :processTestResources NO-SOURCE
> Task :compileKotlin
> Task :compileJava NO-SOURCE
> Task :classes UP-TO-DATE
> Task :jar
> Task :compileTestKotlin
> Task :compileTestJava NO-SOURCE
> Task :testClasses UP-TO-DATE


2 tests completed, 1 failed
> Task :test FAILED

FailingTest > this test intentionally fails() FAILED
    org.opentest4j.AssertionFailedError at FailingTest.kt:10
4 actionable tasks: 4 executed

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':test'.
> There were failing tests. See the report at: file:///app/build/reports/tests/test/index.html

* Try:
> Run with --scan to get full insights from a Build Scan (powered by Develocity).

BUILD FAILED in 30s



▶ .......................................✔ connect 0.2s
✔ load workspace: . 2.5s
✔ parsing command line arguments 0.0s

✔ daggerKotlin: DaggerKotlin! 0.8s
✔ .test(
  ┆ source: Address.directory(exclude: ["**/build", ".gradle", ".github", ".idea", ".vscode"]): Directory!
  ): Directory! 32.2s
✔ .export(path: "./build/test-results"): String! 0.0s
#

There's no more otel issues, and the files are exported:

$ ls build/test-results
TEST-com.example.FailingTest.xml  TEST-com.example.HelloWorldTest.xml
#

When running with container = self._gradle(source).with_exec(["./gradlew", "test"], expect=dagger.ReturnType.ANY) you can also then retrieve the exit code exit_code = await container.exit_code() and perform any different action depending if it has been a success or not

compact crystal
#

Yes ! We know, that's part of our demo 😛 (You can see that we have the TestResult class and even the test function commented to fix this problem)
The goal was to show that when a test fail, you need to use this ReturnType to get the reports

But on the first run, we show the gradle test failing and we have no more test reports.

#

So yeah, in the end it will work, but I don't get why we have this opentelemetry issue. I don't know the use case of the github issue, but it feels weird.

ancient sage
#

ok, I haven't looked at the different steps of the demo 😄
let me investigate a bit deeper then

compact crystal
ancient sage
#

OK, I'm able to reproduce it (outside of your example) and I'm working on a fix

ancient sage
#

@compact crystal That said, I have a quick fix:
If in your pyproject.toml you set those dependencies:

dependencies = [
  "dagger-io",
  "opentelemetry-sdk<1.40",
]

And run uv lock
the problem should disappear. At least the time the proper fix is released

compact crystal
#

Thanks !

ancient sage
#

np
and if you have any questions, need help, etc before the talk, to not hesitate to reach out

ancient sage
#

@compact crystal I merged the PR with the fix.
If you want you can install dagger from main using curl -fsSL https://dl.dagger.io/dagger/install.sh | DAGGER_COMMIT=head sh (add BIN_DIR if you need to)
That way you are just running main, no specific code, deps etc to change.
But be aware that main contains a bigger change compare to v0.20.6 that is the big removal of buildkit. So if you want to go that way, I strongly encourage you to run your full demo against this version before, to ensure there no regression but also to be familiar with the new version (for instance if some output has changed)

compact crystal
#

Not sure I'm going to do that just 1 day before the demo.
I like to live dangerously, but not that much 😛