Basically run (container A + service). | Dagger | Page 1

fair wave Mar 26, 2024, 6:44 PM

#

This sounds related to https://github.com/dagger/dagger/issues/6493 to me

GitHub

Networking errors after calling `Sync()` on Containers that have de...

I have a pipeline with multiple Services and Containers being passed around. If a Container that has a dependent service is Sync()'ed, future calls to the Service will fail with something like:...

surreal skiff Mar 26, 2024, 6:55 PM

#

Sounds like it was fixed though? I am getting this on 0.10.2 and 0.10.3

fair wave Mar 26, 2024, 6:56 PM

#

afaik some cases were fixed with a workaround but there's still some issue @olive dagger

surreal skiff Mar 26, 2024, 7:01 PM

#

hmm, I am not doing a sync anywhere. I am however, pulling a file with container.Directory(...).File(...)

fair wave Mar 26, 2024, 7:12 PM

#

Yeah the issue turned out to be more about service lifecycles and what happens after they decide to stop

#

(but it could be different from what you're seeing, I'm not sure)

surreal skiff Mar 26, 2024, 7:28 PM

#

In my case in the TUI it's showing that the service is still running. Why it's attached to a brand new container is what's problematic. As a workaround I may have to pass around the service container too and attach it to all subsequent containers in the same chain. I haven't tried that yet but sounds like the only workaround right now

fair wave Mar 26, 2024, 7:50 PM

#

I actually might be running into your issue with something else I'm working on. Do you have the same issue on 0.10.1?

surreal skiff Mar 26, 2024, 8:32 PM

#

I haven't checked 10.1.. I can test it

#

whoa! @fair wave , I am not seeing it in v0.10.1! So there is a regression in 0.10.2. Nice catch

fair wave Mar 26, 2024, 8:46 PM

#

Perfect 😅 I'll see if I can track it down further

fair wave Mar 27, 2024, 12:29 AM

#

trying to repro with

func (r *Repro) Run(ctx context.Context) (string, error) {
    redis := dag.Container().
        From("redis").
        WithExposedPort(6379).
        AsService()
    cli := dag.Container().
        From("redis").
        WithoutEntrypoint().
        WithServiceBinding("redis", redis)

    ctrA := cli.WithExec([]string{"sh", "-c", "redis-cli -h redis info >> /tmp/out.txt"})

    file := ctrA.Directory("/tmp").File("/out.txt")

    ctrB := dag.Container().
        From("alpine").
        WithFile("/out.txt", file)

    return ctrB.WithExec([]string{"cat", "/out.txt"}).Stdout(ctx)
}

and it's not working thinkies

#

or rather, it is working and not breaking

#

my other thing repros fine but it's more complex and I was trying to come up with a simple repro. So the important bit must be missing here

surreal skiff Mar 27, 2024, 1:52 AM

#

Huh.. strange. I'll play around with this tomorrow and also try to repro it on my end.

surreal skiff Mar 27, 2024, 3:18 AM

#

I couldn't wait :).. I think I found the repro. If you don't do the Stdout(ctx) from ctrB (last line). It errors the same way

#

so basically returning a *Container from the function. I tried doing the Stdout(ctx) from the CLI and it worked. Weird thing is, I tried the same on my failing project and it didn't work 😦 So there may be some other condition also.

fair wave Mar 27, 2024, 1:08 PM

#

Nice! I'm seeing the same

olive dagger Mar 27, 2024, 2:08 PM

#

will bisect ❤️ thanks for the full repro ❤️

olive dagger Mar 27, 2024, 2:31 PM

#

yeah ok, it's definitely https://github.com/dagger/dagger/pull/6806

GitHub

engine: isolate buildkit client+session to each client by sipsma · ...

This is a spin-off of work on #6747, which has proven itself to be a fountain side-quests all over the codebase. Splitting that effort up into separate PRs for everyone's sanity. There will be ...

#

it's possibly related to my comment here: https://github.com/dagger/dagger/pull/6806#discussion_r1511158437

fair wave Mar 27, 2024, 2:42 PM

#

Yeah that sounds like a winner based on that discussion

surreal skiff Mar 27, 2024, 2:46 PM

#

paging @icy isle 🙂

olive dagger Mar 27, 2024, 2:56 PM

#

logs from the engine look quite sus as well:

time="2024-03-27T14:54:17Z" level=debug msg="removing server" client_call_digest= client_hostname=daggerdoer client_id=3w59c7g7b0x1ksxwmaun1wb18 server_id=c0ndgmejermst4edr1f9uzhjn spanID=6833802633a932ed traceID=59d67e65dd0d958679ce82e798683898
time="2024-03-27T14:54:17Z" level=debug msg="shutting down service i3f462iju6h70.roof2j4hq2i5q.dagger.local"
time="2024-03-27T14:54:17Z" level=debug msg="sending sigkill to process in container 1vclaw6ipy2i6lqm0vd43p276"
time="2024-03-27T14:54:17Z" level=debug msg="releasing cni network namespace vl5wnvo9krky14nne32bpjnpe"
dnsmasq[30]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
time="2024-03-27T14:54:18Z" level=error msg="failed to release containers" error="exit code: 137"
time="2024-03-27T14:54:18Z" level=error msg="failed to close server" client_call_digest= client_hostname=daggerdoer client_id=3w59c7g7b0x1ksxwmaun1wb18 error="rpc error: code = Unavailable desc = error reading from server: EOF\nclose unix /run/dagger/server-progrock-6ascje1al2g1nqlele4vuq3x6.sock: use of closed network connection" server_id=c0ndgmejermst4edr1f9uzhjn spanID=6833802633a932ed traceID=59d67e65dd0d958679ce82e798683898

olive dagger Mar 27, 2024, 3:26 PM

#

ohhh i think i know what's going on.
because of this pr, the cli session and the module session are separate buildkit sessions - the service gets started in the module session, but then the module terminates, so buildkit tears down the service
but we expect the service to still be running (a fair assumption before this pr)

#

i think the trick is that we need to attach services to the top-level session? which is by definition the longest running one

fair wave Mar 27, 2024, 3:28 PM

#

Interesting, yeah that makes sense to me. I often have a service from one module that I need to re-use throughout my pipeline

olive dagger Mar 27, 2024, 3:43 PM

#

olive dagger i think the trick is that we need to attach services to the top-level session? w...

actually hm, i'm not sure about that idea - while this is possible, the container target needs to be evaluated in the module context (so we get the secrets), but the gateway container should be run at the top-level

#

another option could just be to not tidy up any sessions that have running containers associated with them (we could try keeping those sessions alive until they all stop)
EDIT: this is actually what we're already doing facepalm

#

hm, no actually it's not to do with lifetimes, it just looks like it's something to do with the search domains:

      ✔ check 03ptb0jbfm7kq.8jkh3ttqhjhvk.dagger.local 6379/tcp 0.2s
      ┃ polling for port 03ptb0jbfm7kq.8jkh3ttqhjhvk.dagger.local:6379                                                                 
      ┃ port not ready: dial tcp 10.87.0.19:6379: connect: connection refused; elapsed: 385.233µs                                      
      ┃ port is up at 10.87.0.19:6379

  ✘ exec sh -c cat /etc/hosts && cat /out.txt 0.2s
  ┃ host alias: lookup 03ptb0jbfm7kq on 10.87.0.1:53: no such host                                                                     
  ┃ lookup 03ptb0jbfm7kq.88lofm05gl3hc.dagger.local on 10.87.0.1:53: no such host

#

03ptb0jbfm7kq.8jkh3ttqhjhvk.dagger.local != ``03ptb0jbfm7kq.88lofm05gl3hc.dagger.local`

fair wave Mar 27, 2024, 4:02 PM

#

oh 🙃 how does that happen??

surreal skiff Mar 27, 2024, 4:11 PM

#

huh. I don't understand why the container which has no attached service is trying to look up the attached service from another container.

olive dagger Mar 27, 2024, 4:24 PM

#

yeah, something isn't making a ton of sense here - now i'm suddenly not sure about anything here, i don't really understand how this is supposed to work.
with the parent-id chaining thing we do for services, why is a parent able to access a service started in a child at all? why was this possible even before this refactor? (since the parent shouldn't have the child in the search domain at all)

#

cc @hexed falcon did we explicitly handle this case before?

#

will come back to this tomorrow 🙂

hexed falcon Mar 27, 2024, 4:26 PM

#

GitHub

namespace services by server, not by client by vito · Pull Request ...

Previously it was possible to start a dependent service in one module API call, and then use it again in a later call, only to have it fail because it cannot resolve the service address, even thoug...

#

tl;dr this used to work, but we changed services to be scoped per server, but didn't change the DNS to be scoped per server at the same time, which that PR fixes, but it's dependent on other work that Erik has in flight

olive dagger Mar 27, 2024, 4:30 PM

#

facepalm definitely related

#Basically run (container A + service).