#Local dagger module dependencies

1 messages · Page 1 of 1 (latest)

rigid lake
#

Apologies in advance for the wall of text.

[1/3] I've run into a bit of an issue when it comes to managing dagger dependencies. I need to pull dagger modules from our private git hosting, which is available over a tailscale network. The host machines (local dev machines or CI runners) are joined to the tailnet and can access the resources just fine. However if I dagger install git.tailnet-host.com/my-mod the runtime container attempts to resolve and connect to git.tailnet-host.com, and can do neither because it's not on the tailnet.

For code running within the dagger pipeline, this isn't much of an issue because the CI code can bring up proxy services which permit resolution/access. Indeed you don't want your CI pipeline code to depend on the network config of some random dev's machine, so we want isolation here.

However If I'm just doing some development work on my dev machine, I have no way to control the network or proxy environment the dagger toolchain runs in. I think it would be possible to provision a custom runner and set proxy variables there to grant access, but then that proxy setup gets passed on to every container the pipeline runs, which will definitely cause problems and breaks isolation. This is also a pretty bad developer experience. Everyone needs to a) re-configure their tailscale daemon to run a local proxy, b) provision a local runner with the right ALL_PROXY etc., c) make sure to always use that runner.

#

[2/3] Now, you might say that our remote work setup using tailscale is kind of niche, but I think the same problem would pop up with any corporate intranet/proxy/vpn/mdm that authenticates devices and/or device posture. Essentially dagger has conflated the class of operations which are used to set up the pipeline environment (relying on privileged network/fs/dns operations) with those which are used to run the pipeline. It seems to me that we want full isolation for the latter, but not the former. There is always going to be some privileged access required to set up the environment, whether it happens within the CI host, CI runner or dagger runner. This is somewhat acknowledged by the existence of dagger APIs for e.g. SSH forwarding, it's just that the authentication is only the final step in accessing controlled resources; the dns/routing/etc. is left out. It might not be an issue if you're living in cloud land where everything is on github.com, but that's far from universal. Indeed, I am evaluating dagger for my team specifically to allow us to more easily vendor and control all our CI functions internally. This is very top-of-mind in the age of major npm supply-chain attacks every other week!

A partial solution I've been using is the following:

  • Maintain private dagger modules on the tailscale host
  • The CI workflow or developer runs a script to check out all the modules into the dagger module source tree before running the pipeline
  • dagger.json then references those local modules
  • Modules which depend on other private modules assume said modules will be available in their local source tree
#

[3/3] This works somewhat, but it's definitely a fragile hack. When using local git modules, dagger attempts to remotely fetch refs and so forth, so it fails just like it would with the remote git ref. This means dagger.json can't specify versions for local dependencies. At the top level this could be solved by telling the checkout script to get a particular version, but this isn't so simple for inter-module dependencies. And then instead of writing code, we're writing a meta-package manager for our CI's package manager so that we can run CI so that we can write code. Back to yaml... 😕

It seems like the 'right' solution could be some combination of:

  1. Allow host-bridged networking for some part of the dagger CLI/API.
  2. Allow specifying dependencies with local git repos, using only locally fetched tree/refs. Then the privileged environment could set things up before dagger is invoked.
  3. Add CLI support to directly vendor dependencies, in userspace, similar to cargo vendor for Rust.

Anyhow, maybe I'm holding it wrong. I'd be happy to hear of a good way to side-step these issues.

untold narwhal
#

(args of type Service)

#

mmm but you can't do this for a service that the engine itself depends on...

#

we have a similar issue for pulling containers: Container.from() can't get arbitrary service bindings

#

am I reading the situation correctly?

rigid lake
#

Yes I've run into the .from() and .publish() issues as well. I handle those by passing in services/endpoints like you're suggesting, but that's only possible because I'm already in the dagger pipeline running code I control.

The issue here is that I need to resolve all the code dependencies for dagger modules that make up the pipeline(s). Dagger attempts to do this inside the engine's containers, but those don't know allow passing in services.

However isn't this kind of an architectural problem?
a) all dagger environments are completely isolated
b) dagger tries to use those isolated dagger environments to bootstrap the dagger source code we want to execute

It seems like it can't be turtles all the way down. If you want dagger to be able to operate in a generic environment, don't you at some point need to say "dagger, please resolve your dependencies and prepare the source code for execution using the current environment", where the current environment is axiomatically assumed to have access to the required resources? But that would require some part of the dependency resolution/fetching to run in the host environment, until the executable dagger module can be "handed off" the engine, which will run it in isolation.

plain kettle
#

However if I dagger install git.tailnet-host.com/my-mod the runtime container attempts to resolve and connect to git.tailnet-host.com, and can do neither because it's not on the tailnet.

@rigid lake is this because a particular configuration on your tailnet or something? Once I connect my host machine with tailnet, after restarting the dagger engine (so the network configurations get updated) I can reach my tailnet hosts without any issues

#

basically all your dagger's containers traffic should be routed through your tailscale connected host and it should be able to reach the same destinations than your host as long as there's no custom specific access that you configure from your side

#

I've just tested this in a default tailscale account and I was able to access the tailscale hosts without issues.

rigid lake
# plain kettle

Wait... this is news to me. I couldn't really find docs on how the dagger networking works, but I assumed everything was running in an isolated 10.87.0.0/24 network, similar to a docker network. So I guess when the dagger command is invoked from my shell (like your example) I get some kind of host-bridged networking, but only for the first container? As far as I can tell, if I run stuff from within a dagger function in my CI pipeline, it can't access anything on the tailnet without a proxy configured.

In any case, it must be, because if I run dagger -c 'container | from alpine/git | with-exec git,clone,http://<the git host tailnet IP>:<port>/<dagger module> it successfully fetches the repo.
However if I run dagger install <tailnet git host IP>:<port>/the-module.git it complains about an HTTPS error, and if I use any tailscale DNS name it fails to resolve it on 10.87.0.1:53.

So then this is really an issue with the dagger engine resolver vs host resolver?

plain kettle
#

Wait... this is news to me. I couldn't really find docs on how the dagger networking works, but I assumed everything was running in an isolated 10.87.0.0/24 network, similar to a docker network. So I guess when the dagger command is invoked from my shell (like your example) I get some kind of host-bridged networking, but only for the first container?

that's not quite what's happening. When you run things from the shell they still run in containers within their own network. The reason why my example above works is because, as I mentioned in my last reply, all the traffic that happens within the dagger containers, gets routed through your host. So, if your host is connected to the tailnet, whatever Dagger does should be able to reach to it

#

In any case, it must be, because if I run dagger -c 'container | from alpine/git | with-exec git,clone,http://<the git host tailnet IP>:<port>/<dagger module> it successfully fetches the repo.

does this work with git clone https? because you used http here

#

mind sharing what https error you're getting?

plain kettle
#

I think the error you're getting might probably be related to some TLS certificates missing maybe?

rigid lake
# plain kettle I think the error you're getting might probably be related to some TLS certifica...

The https error is not a dagger problem. It's because the service TLS is handled by a reverse proxy, but I can't initiate a TLS connection through the reverse proxy with the raw IP because the certificate is issued for the tailnet name. Just to see if it would work, I made that call from a host that can get at it directly, but that's not normally possible. It's a live host so I can't reconfigure the TLS setup.

I'm somewhat confused as to why I was previously unable to access tailscale stuff from inside dagger containers without a proxy. It did not seem like it was routing through the host. But given that it works here, it seems like if the DNS can be resolved inside whatever containers dagger install uses, the dependency fetching problem would be solved. So based on my other experience running tailscale DNS in dagger, it seems like there will be two options:

  1. Somehow make the dagger CLI use the host's resolvers, which will use tailscaled to resolve the dns names.
  2. Somehow make the dagger CLI use a tailscale proxy, which will use the host tailscaled to resolve dns names.

We can't realistically use the raw tailnet IPs, because if it ever changed it would break every single dagger module until they were manually updated.

plain kettle
#

If you check the screenshot I posted above, I can actually reach my tailnet hosts by using the fqdn

rigid lake
# plain kettle If you check the screenshot I posted above, I can actually reach my tailnet hos...

I'm not using the raw IP. I was just confirming that it does in fact work with the raw IP.

I need to be able to use e.g. git.my-company.com, which is a FQDN resolved by a nameserver on the tailnet. On a normal host, tailscaled will configure a local resolver to search *.my-company.com on 100.100.100.100 (aka "split dns"), which is a DNS server available to all tailnets. That DNS server will tell the client "you can find that my-company.com name on <some my-tailnet ip>, which is just a regular DNS server on my-tailnet.ts.net, which will resolve the name to its raw IP. Then the host networking will route the request as needed using that IP.

If you're saying that routing+resolvers+dns should be working in dagger containers exactly the same as in the host, then that doesn't seem to be the case. Resolver stub from the container:

$ dagger -c 'container | from alpine | with-exec cat /etc/resolv.conf | stdout'
▶ connect 0.2s
▶ detect module: . 0.8s
▶ load module: /home/kyzyl/src/dagger-test 2.5s

$ Container.from(address: "alpine"): Container! 0.7s CACHED
$ .withExec(args: ["cat", "/etc/resolv.conf"]): Container! 0.0s CACHED
▶ .stdout: String! 0.0s

nameserver 10.87.0.1
search my-tailnet.ts.net cpj0vulq9o4ba.dagger.local

And from the host:

$ cat /etc/resolv.conf
nameserver 127.0.0.53
options edns0 trust-ad
search my-tailnet.ts.net

Notably, they are both using a local nameserver and adding my-tailnet as a short search name. However, dig @10.87.0.1 git.my-tailnet.com, or dig @10.87.0.1 git-host.my-tailnet.ts.net all fail to resolve. Meanwhile dig @127.0.0.53 ... on the host will happily resolve those names. I'm guessing that the dns server at 10.87.0.1 does not have 100.100.100.100 as a nameserver. If it's directly proxying to the host then I don't know why it doesn't work.

rigid lake
plain kettle
plain kettle
plain kettle
# plain kettle <@341797950206509059> it also works with a instance within my private tailnet

the way this works is because if after connecting to your tailnet you start your dagger engine, it'll set the same upstream DNS servers as your host machine which, when connecting to tailscale, they should be 100.100.100.100. You can verify this by running the following command docker exec $(docker ps --filter name="dagger-engine-*" -q) cat /etc/dnsmasq-resolv.conf which should show 100.100.100.100 if you started the engine after connecting to the tailnet.

Once the engine has the correct DNS server, whatever container runs as either part of a pipeline or runtime SDK container should be able to access all your tailnet endpoints

rigid lake
# plain kettle the way this works is because if **after** connecting to your tailnet you start ...

Does dagger perhaps take only the first nameserver it sees? My machine has a bunch of links, two of which are scoped for DNS. The first is the main uplink enp8s0 with DNS set to 10.0.0.1 (the LAN gateway, not on tailnet), and the second is tailscale0, with DNS set to 100.100.100.100. Dagger engine only shows 10.0.0.1 nameserver, so it's no wonder it can't resolve anything. I'm stopped/restarted/removed dagger-engine many times, and cycled the tailscale interface.

plain kettle
#
130|marcos:Projects/dagger (cli-cloud-tui-indicator) (⎈ |N/A)$ cat /etc/resolv.conf
# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 1.1.1.1
nameserver 192.168.1.1
search .
130|marcos:Projects/dagger (cli-cloud-tui-indicator) (⎈ |N/A)$ sudo tailscale up
marcos:Projects/dagger (cli-cloud-tui-indicator) (⎈ |N/A)$ cat /etc/resolv.conf
# resolv.conf(5) file generated by tailscale
# For more info, see https://tailscale.com/s/resolvconf-overwrite
# DO NOT EDIT THIS FILE BY HAND -- CHANGES WILL BE OVERWRITTEN

nameserver 100.100.100.100
search tail450d.ts.net

^

#

AFAIK it needs to be this way since it's tailscale split DNS the one that knows how to resolve your traffic to your tailnet / everywhere else

rigid lake
#
╭╴♥ 14:48 | kyzyl | …/bws-run-populator |  main
╰─ cat /etc/resolv.conf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad
search my-tailnet.ts.net
plain kettle
rigid lake
#

It's on the tailnet atm.

#

It doesn't change if I tailscale up/down. It always points to the local dns server. If I take tailscale down I can see the tailscale DNS nameserver disappear from resolvctl status though.

plain kettle
#

ok, strange.. in my case seems like systemd-resovled is not starting a dns server locally but using the one that gets set via my DHCP I'd assume

#

so when I do tailscale up, it replaces my primary dns with quad 100

rigid lake
#

The thing is it isn't putting in 127.0.0.53, it's putting in 10.0.0.1. Which means it is following the chain to get the actual local nameserver, just not the second one.

#

I also just tried upgrading tailscale to make sure it's not a new behavior. No difference.

plain kettle
#

if you run docker run --rm alpine cat /etc/resolv.conf you should see that 10.0.0.1 thing

#

which is the main DNS the dagger engine is currently getting since it's started as a docker container

#

it makes sense for docker no to use 127.0.0.53 since that wouldn't be accessible by your containers

#

so it's very likely replacing that by your bridge gateway interface IP or something

#

I'm trying to see why I can't get systemd-resolved to actually start a dns server for me. I recall it did that in the past but it doesn't seem to be doing it anymore for some reason

#

I guess the strange part is that when I start tailscale it actually replaces my main DNS server for 100.100.100.100 which becomes the primary resolution for me

rigid lake
# plain kettle if you run `docker run --rm alpine cat /etc/resolv.conf` you should see that 10....

So your network interface has its DNS configured automatically? I see that if mine is configured automatically, which gets 10.0.0.1, that's all that gets entered into the docker resolv.conf. But if I manually specify 10.0.0.1, 100.100.100.100, it will add two nameserver entries into resolv.conf. This isn't all the useful though, because the host machine still needs non-tailscale DNS, and only one server from resolv.conf will be queried. I think that's why on mine it just points to a local server that has all the nameservers.

plain kettle
# rigid lake So your network interface has its DNS configured automatically? I see that if mi...

when I connect to the tailnet my main dns becomes 100.100.100.100 and then tailscale takes care of resolving things for me

docker run --rm alpine cat /etc/resolv.conf
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

nameserver 100.100.100.100
search tail450d.ts.net

# Based on host file: '/etc/resolv.conf' (legacy)
# Overrides: []
#

I think systemd-resolved is complicating things here since it uses that local DNS server for caching

rigid lake
plain kettle
#

edit /etc/systemd/resolved.conf, set DNSStubListener=no

#

and then sudo systemctl restart systemd-resolved

#

I'm curious about what /etc/resolv.conf shows after that

rigid lake
plain kettle
#

now I get the same behavior you're getting with tailscale

plain kettle
#

after setting DNSStubListener=no and restarting the tailscale daemon systemctl retsart tailscaled now I see that tailscale is correctly replacing my /etc/resolv.conf once I connect/disconnect from the tailscale

#

mind trying that really quick if you have time?

#

remember that you should restart systemd-resovled after changing resolved.conf

rigid lake
#

Yeah so without the listener stub I get 10.0.0.1 and 100.100.100.100, which means if I tailscale down it breaks DNS entirely. If I swap the nameserver order, then public DNS works but tailscale DNS is broken, regardless of tailscale up/down. This is because resolved only tries one entry. Docker seems to just grab whatever is in /run/systemd/resolve/resolv.conf, so the situation is either:

  1. Listener stub => dns server runs => both public and tailscale dns can resolve regardless of whether dns is up.
  2. No listener stub => one or the other is broken
  3. No listener stub + tailscale rewrites resolv.conf automatically => everything works

In all cases but (3), docker will get a resolv.conf that isn't functional, and so will dagger.

rigid lake
#

Correction: it doesn't rewrite the nameservers, it does rewrite the search path. But it did that with the stub listener also.

plain kettle
#

have you validated your stub dns server is not running?

#

does sudo fuser -v -n tcp 53 -n udp 53 show something?

#

in any case.. seems like all this dancing is a systemd-resolved + docker issue mostly.

if you find a way to make the tailscale resolv.conf rewrite work, that should fix all your issues

#

for the systemd-resolv to work out of the box that's a bit trickier since it probably requires some changes on how docker actually sets up DNS for the containers that it runs since Dagger inherits that by default when provisioned by docker

#

gotta run now. Happy to keep chatting about this async to see what options we have here 🙌