#k it passed So definitely some weird

1 messages · Page 1 of 1 (latest)

proven crane
#

After a long time I got a single occurrence locally when running go test -v -count=1000 -timeout=1h -run=TestContainerExecError/includes_output_of_failed_exec_in_error_when_redirects_are_enabled ./core/integration/

I had modified the test so that it pauses when that error gets hit. It looks like the session subprocess is dead. The question is why there's not any output, if it was panicking it should in theory send that to its stderr which gets forwarded to the sdk and printed, but double checking everything

atomic walrus
#

same:

#0 0.102 THIS SHOULD GO TO STDOUTTHIS SHOULD GO TO STDERR
------                                                                                                     
    container_test.go:2795:                                                                                                Error Trace:    /home/marcos/Projects/dagger/core/integration/container_test.go:2795
                Error:          "Post \"http://dagger/query\": EOF\nPlease visit https://dagger.io/help#go 
for troubleshooting guidance." does not contain "THIS SHOULD GO TO STDOUT"
                Test:           TestContainerExecError/includes_output_of_failed_exec_in_error_when_redirec
ts_are_enabled
--- FAIL: TestContainerExecError (1.27s)
    --- PASS: TestContainerExecError/includes_output_of_failed_exec_in_error (0.31s)
    --- FAIL: TestCon```
atomic walrus
#

@proven crane another input: can't seem to repro if I start the session binary manually and run tests with DAGGER_SESSION* vars

#

session binary never exists

#

also, BTW:
running tests with the session binary started manually and avoiding to fork/exec it every time it's considerably faster. 6s vs with DAGGER_SESSION* vars vs 16s (+60% improve) without them for the example that I'm trying

cc @fleet elbow @ember void

proven crane
#

I added debug prints to the session binary so it says when its shutting down due to a signal and when its shutting down due to stdin being closed.

The test case doesn't include a defer to close the client, so when its done running the child process subprocess keeps running for bit.

So with -count=1000 what I see is that every once in a while you all of a sudden get a bunch (~10-20) of lines saying that a session process is shutting down due to the stdin being closed. This makes sense because the Go GC is collecting the stdin pipe object and closing it as part of the finalizer.

What's interesting is that

  1. when I do get a failure it seems to coincide w/ one of those GC events
  2. When I include a defer that closes the client in the test case, I can't reproduce it anymore
#

Not sure what to make of that yet, I am trying to see if it's possible for the stdin pipe object to be available for GC before the client is closed, but that doesn't seem like it should be possible so far

atomic walrus
proven crane
#

Yeah it seems somehow related, but doesn't make sense fully. Like it's either we are incorrectly leaving the stdin pipe open for GC or the go runtime is incorrectly collecting a pipe that should still be accessible... Hesitate to blame the go runtime at this point but I guess it's possible

atomic walrus
#

trying to repro with GOGC=off now 😄

#

aaaaaaaaand... computer started to swap 😄

proven crane
#

Have a hunch, I can see a plausible path where it may actually be true that the pipe object becomes unreachable after making a call like c.Container().From(...)... but before the http request finishes. It would rely on not closing the client; basically, that pipeline would have to be your last reference to c. If gc triggers at the exact right time it could happen. Gonna test a fix, will push and/or update the issue