#are you able to run `dagger call sdk go
1 messages ยท Page 1 of 1 (latest)
Hmmm
sipsma@dagger_dev:~/repo/github.com/sipsma/dagger$ dagger call sdk go generate
โ initialize 4.9s
! input: moduleSource.withContextDirectory.asModule resolve: failed to create module: select: failed to update module dependencies: failed to initialize dependency modules: failed to initialize dependency module: select: failed to create module: select: failed to update codegen and runtime: failed to generate
code: failed to get modified source directory for go module sdk codegen: select: process "/usr/local/bin/codegen --output /src --module-context-path /src/docusaurus/dagger --module-name docusaurus --introspection-json-path /schema.json" did not complete successfully: exit code: 1
Stderr:
exec /.init: exec format error
โ ModuleSource.asModule: Module! 3.7s
! failed to create module: select: failed to update module dependencies: failed to initialize dependency modules: failed to initialize dependency module: select: failed to create module: select: failed to update codegen and runtime: failed to generate code: failed to get modified source directory for go mod
ule sdk codegen: select: process "/usr/local/bin/codegen --output /src --module-context-path /src/docusaurus/dagger --module-name docusaurus --introspection-json-path /schema.json" did not complete successfully: exit code: 1
Error: input: moduleSource.withContextDirectory.asModule resolve: failed to create module: select: failed to update module dependencies: failed to initialize dependency modules: failed to initialize dependency module: select: failed to create module: select: failed to update codegen and runtime: failed to gen
erate code: failed to get modified source directory for go module sdk codegen: select: process "/usr/local/bin/codegen --output /src --module-context-path /src/docusaurus/dagger --module-name docusaurus --introspection-json-path /schema.json" did not complete successfully: exit code: 1
Stderr:
exec /.init: exec format error
Not good
I wonder if dumb-init is only being built for the host platform....
๐
Okay well that's issue 1
I can enable the qemu-emulators to workaround for a sec
to try reproing the dns issue
huh we might see about getting an arm ci runners
actually idk if the emulators will kick in
if it's in binfmt on your host, i think they should
yeah I was gonna say, either that or somehow add tests to all the binaries we build to make sure they are the right platform
yup yup, 100%
ohh
so if you connect to the service binding name it works
but not if you connect to the endpoint
so if i change the HOST to dagger-engine in ci/sdk.go's installer
Right I actually was more curious how the hell this worked previously after seeing the error message you sent lol
my guess is somehow we now miss adding the endpoint to the hosts file or dns
Still want to understand that since I suppose this was a breaking change in some way, but it may be something incredibly obscure that just happened to work before
oh were we not ever supposed to connect in this way ๐
idk yet, the nesting is making it hard to think about, it depends on which engine the go sdk codegen is trying to run in
but it appears I can't repro the dns issue:
sipsma@dagger_dev:~/repo/github.com/sipsma/dagger$ dagger call --source . sdk go generate entries
sdk
sipsma@dagger_dev:~/repo/github.com/sipsma/dagger$ dagger version
dagger v0.11.5 (registry.dagger.io/engine) linux/arm64
well i guess that's nice - seems to be somewhat machine dependent, so at least you can repro what ci is seeing
not quite sure why i'm special though
Does any DNS work in dagger? Like a withExec that curls google.com?
I've gotten into situations where DNS inside docker breaks, which breaks DNS inside dagger
Usually involving tailscale in my particular case
I'm gonna fix the dumb-init problem now either way, that's probably worse overall
good morning ๐ sorry to begin your day with regressions
okay have some pings from inside the container:
67 : [0.4s] | PING dagger-engine (10.87.0.24): 56 data bytes
67 : [0.4s] | 64 bytes from 10.87.0.24: seq=0 ttl=64 time=0.160 ms
67 : [1.4s] | 64 bytes from 10.87.0.24: seq=1 ttl=64 time=0.089 ms
67 : [2.4s] | 64 bytes from 10.87.0.24: seq=2 ttl=64 time=0.087 ms
67 : [3.4s] | 64 bytes from 10.87.0.24: seq=3 ttl=64 time=0.084 ms
67 : [3.4s] |
67 : [3.4s] | --- dagger-engine ping statistics ---
67 : [3.4s] | 4 packets transmitted, 4 packets received, 0% packet loss
67 : [3.4s] | round-trip min/avg/max = 0.084/0.105/0.160 ms
67 : [3.4s] | Server: 10.87.0.1
67 : [3.4s] | Address: 10.87.0.1:53
67 : [3.4s] |
67 : [3.4s] | ** server can't find dagger-engine: NXDOMAIN
67 : [3.4s] |
67 : [3.4s] | ** server can't find dagger-engine: NXDOMAIN
67 : [3.4s] |
67 : [3.4s] | PING google.com (142.250.200.46): 56 data bytes
67 : [3.5s] | 64 bytes from 142.250.200.46: seq=0 ttl=114 time=23.650 ms
67 : [4.5s] | 64 bytes from 142.250.200.46: seq=1 ttl=114 time=20.535 ms
67 : [5.5s] | 64 bytes from 142.250.200.46: seq=2 ttl=114 time=20.705 ms
67 : [6.5s] | 64 bytes from 142.250.200.46: seq=3 ttl=114 time=17.030 ms
67 : [6.5s] |
67 : [6.5s] | --- google.com ping statistics ---
67 : [6.5s] | 4 packets transmitted, 4 packets received, 0% packet loss
67 : [6.5s] | round-trip min/avg/max = 17.030/20.480/23.650 ms
67 : [6.5s] | Server: 10.87.0.1
67 : [6.5s] | Address: 10.87.0.1:53
67 : [6.5s] |
67 : [6.5s] | Non-authoritative answer:
67 : [6.5s] | Name: google.com
67 : [6.5s] | Address: 142.250.200.46
67 : [6.5s] |
67 : [6.5s] | Non-authoritative answer:
67 : [6.5s] | Name: google.com
67 : [6.5s] | Address: 2a00:1450:4009:81d::200e
67 : [6.5s] |
67 : [6.5s] | ping: bad address 'maqhuug9fbhak'
67 : [6.5s] | Server: 10.87.0.1
67 : [6.5s] | Address: 10.87.0.1:53
67 : [6.5s] |
67 : [6.5s] | ** server can't find maqhuug9fbhak: NXDOMAIN
67 : [6.5s] |
67 : [6.5s] | ** server can't find maqhuug9fbhak: NXDOMAIN
i can reach dagger-engine and google.com, but not the endpoint
the NXDOMAIN is from nslookup but that doesn't seem to work on either of the ways to reach a service (i think expected)
Well the init is 100% my fault, and the liklihood of whatever this DNS thing is being my fault is at about 98%, so I only have myself to thank ๐
potentially this could maybe be related to a vm-thingy - e.g. if you're using docker desktop, vs running docker on your host
i wouldn't be surprised if there's some magic dns stuff happening deep in DD
If it's easy, can you cat /etc/hosts and /etc/resolv.conf from that exec where it can't resolve the service name?
eh, should have caught in code review tbh ๐
yup will do, sorry, i'm heading off for the day soon (and then am on pto till tues)
so scrambling a little bit ๐
no worries! I can take things from here. I think the init problem is worth a patch release which I will do once fixed
yup ๐ sgtm - i should be reachable for anything super urgent fyi
okay:
68 : [0.7s] | OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=http://127.0.0.1:44655
68 : [0.7s] | OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpc
68 : [0.7s] | CGO_ENABLED=0
68 : [0.7s] | _EXPERIMENTAL_DAGGER_CLI_BIN=/.dagger-cli
68 : [0.7s] | OTEL_EXPORTER_OTLP_PROTOCOL=grpc
68 : [0.7s] | GOOS=linux
68 : [0.7s] | SHLVL=1
68 : [0.7s] | HOME=/root
68 : [0.7s] | GOTOOLCHAIN=local
68 : [0.7s] | OTEL_EXPORTER_OTLP_METRICS_PROTOCOL=grpc
68 : [0.7s] | _EXPERIMENTAL_DAGGER_RUNNER_HOST=tcp://kln5058ul88ps:1234
68 : [0.7s] | GOROOT=/usr/local/go
68 : [0.7s] | GOARCH=amd64
68 : [0.7s] | OTEL_TRACE_PARENT=00-06ece313899cce68eb27875fbb3b0bb8-2795c9c1ea532b0b-01
68 : [0.7s] | DOLLAR=$
68 : [0.7s] | GOFILE=generate.go
68 : [0.7s] | OTEL_EXPORTER_OTLP_LOGS_PROTOCOL=grpc
68 : [0.7s] | PATH=/usr/local/go/bin:/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
68 : [0.7s] | GOLINE=3
68 : [0.7s] | GOPACKAGE=dagger
68 : [0.7s] | TRACEPARENT=00-06ece313899cce68eb27875fbb3b0bb8-2795c9c1ea532b0b-01
68 : [0.7s] | OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://127.0.0.1:44655
68 : [0.7s] | OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:44655
68 : [0.7s] | GOPATH=/go
68 : [0.7s] | OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://127.0.0.1:44655
68 : [0.7s] | PWD=/app/sdk/go
68 : [0.7s] | GOLANG_VERSION=1.22.3
68 : [0.7s] | OTEL_TRACES_EXPORTER=otlp
68 : [0.7s] | ---- /etc/hosts
68 : [0.7s] | 127.0.0.1 localhost buildkitsandbox
68 : [0.7s] | ::1 localhost ip6-localhost ip6-loopback
68 : [0.7s] |
68 : [0.7s] | 10.87.0.27 dagger-engine
68 : [0.7s] | ---- /etc/resolv.conf
68 : [0.7s] | nameserver 10.87.0.1
68 : [0.7s] | ---- pings
also got an env dump
also here's a full trace: https://dagger.cloud/jedevc/traces/06ece313899cce68eb27875fbb3b0bb8
That does look correct, theoretically the service dns name (not alias) should be resolved via nameserver 10.87.0.1, but for some reason your engine is saying it doesn't know who that service is
gonna grab the equivalent from v0.11.4 as well
just to check to see there's no unexpected differences
here's from v0.11.4
there's a diff in resolv.conf - the new version is missing a search
Oooo interesting! I'll look into that in a sec
ahh we used to have this line: https://github.com/dagger/dagger/blob/v0.11.4/cmd/shim/main.go#L823-L825
it looks like there's no equivalent in the new worker
https://github.com/dagger/dagger/pull/7497 fixes the dumb-init platform and adds checks for all our binaries (pretty sure this same situation happened at least once more in the past, so seems worth it at this point)
Ah yeah that might be it, pivoting to that now
though still mystified by why this was only happening to you so far
@snow sand apparently also hit it
The reason it matters is that I want to actually test the fix e2e somehow, but that requires knowing how to hit it
I'm poking around on my host to see if I have a search in my resolv.conf that gets propagated to the engine through docker and thus never needs the !resolved codepath that got missed in the shim transplant
Just my best guess so far
I can test your fix in a few moments on my apparently cursed machine
If that helps ๐
Sorry doing things in parallel but here's the diff if you have time to try
Also curious what your host's /etc/resolv.conf looks like if you have a sec
Forget to actually send it... https://github.com/dagger/dagger/pull/7501
Yeah I'll be like half an hour - currently on the bus home after doing some last minutes shopping
My host does indeed have a search domain configured, which seems to be propagated by docker to the search domain of the dagger-engine.dev's resolv.conf, so if that's also propagated to each exec container, that may explain this all
No worries, I will try testing this locally by editing my host's resolv.conf, if I can repro the bug you hit then I should be good to go, will update in a sec
Okay yeah removing search from my host's resolv.conf and then reprovisioning everything allows me to repro it, so that is all adding up. No need for you to try