#network troubleshooting
1 messages ยท Page 1 of 1 (latest)
wget http://repo.huaweicloud.com
Connecting to repo.huaweicloud.com (123.125.16.225:80)
Connecting to mirrors.huaweicloud.com (124.70.125.167:443)
It could fetch in pod
but failed in buildkit
hmm, trying to think of ideas but it's getting late here, it's hard to guess what might be going wrong. would need to know more about the infrastructure/network, but ideally this could be narrowed down more ๐ค
it doesn't seem DNS or resolv.conf related at least, since it was able to resolve to an IP
i'm not sure what exact error code 'connection failed' corresponds to, knowing that could help (i.e. ECONNREFUSED?)
sometimes this sort of thing happens with overlapping CIDR ranges, so ours is 10.87.0.0/16 if that helps
@wise grove are you having issues with that specific ip only? or no connection work at all?
I test with other host.
Seems the networks will be hang when request https
do you have access to the VM where your pod is currently running to troubleshoot? Since Dagger services make use of bridge networks, I'm wondering if ip forwarding is set correctly since it's required for that to work.
Everything works well in Pod.
DNS works well.
Now just network hangs when request https
what CNI is your k8s cluster using?
coredns
or which k8s hosted service is that? At this point we need to replicate your setup so we can troubleshoot. Since k8s CNI's operate differently
just k3s v1.23.5+k3s1 default settings
(sorry to pile on here, but this rang a bell) some k8s networks set MTUs lower than the default, but if our bridge has a higher value you can get ip fragmentation. This looks very similar for instance: https://github.com/jupyterhub/binderhub/issues/1228#issuecomment-743977017
Is it possible to run ip addr in the pod @wise grove ? If installed that might show the mtu of the interface being used in there
/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@if1077: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP
link/ether 22:89:6b:2d:38:a4 brd ff:ff:ff:ff:ff:ff
inet 10.58.5.91/24 brd 10.58.5.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::2089:6bff:fe2d:38a4/64 scope link
valid_lft forever preferred_lft forever
3: dagger0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000
link/ether ca:59:36:b6:79:3b brd ff:ff:ff:ff:ff:ff
inet 10.87.0.1/16 brd 10.87.255.255 scope global dagger0
valid_lft forever preferred_lft forever
inet6 fe80::c859:36ff:feb6:793b/64 scope link
valid_lft forever preferred_lft forever
oh fun
I think you can adjust the CNI bridge's mtu: https://www.cni.dev/plugins/current/meta/tuning/#interface-attribute-operation
yep, the bridge plugin has a config for it: https://www.cni.dev/plugins/current/main/bridge/#network-configuration-reference
@wise grove this file should be at /etc/dagger/cni.conflist if you want to try setting it
Could I update it directly ?
yeah, I think it should take effect in new containers
hm, actually not sure, since the interface already exists
i can just build an image that hardcodes it, lol
@wise grove vito/dagger-engine:dev3
@molten perch if you start the engine pod with cni.conflist mounted at that path is should override it, right?
yeah, since it's a regular file we just add in the build process
so you could do a ConfigMap
It works again.
๐ฎ that fixed it?
Yes
@autumn aurora from the top rope!
lol I wrote a CNI plugin for firecracker years ago and am pretty sure I hit this exact thing, it triggered some deep memory
May could be better generated. I don't want to hard code in config map. ๐
{
"cniVersion": "0.4.0",
"name": "dagger",
"plugins": [
{
"bridge": "dagger0",
"hairpinMode": true,
"ipMasq": true,
"ipam": {
"ranges": [
[
{
"subnet": "10.87.0.0/16"
}
]
],
"type": "host-local"
},
"isDefaultGateway": true,
"mtu": 1450,
"type": "bridge"
},
{
"type": "firewall"
},
{
"capabilities": {
"aliases": true
},
"domainName": "dns.dagger",
"type": "dnsname"
}
]
}
lol yeah definitely, I think ideally we'd find some way to detect the right value
nice, we hit something like this back in Cloud Foundry, I think our case was slightly different, and it was horrifying tracking it down to MTU
@scenic vessel @autumn aurora thanks for your help! ๐
I just a question, why http pass but https not.
could just be down to packet size difference
actually it seemed like https worked, but not http
the baidu request showed it following the https redirect to http
so the redirect was probably a smaller packet, and then it couldn't receive the full response
oh that's a screenshot from when it worked, nvm
@molten perch everything works well ๐ฅณ
with registry.dagger.io/engine@sha256:f19205159d0b0ee0e5ffef4c1a374b9681a955d208b180e07e979e5630d77a5c
https://github.com/dagger/dagger/issues/4652#issuecomment-1449219021