#Basically run (container A + service).
1 messages ยท Page 1 of 1 (latest)
This sounds related to https://github.com/dagger/dagger/issues/6493 to me
Sounds like it was fixed though? I am getting this on 0.10.2 and 0.10.3
afaik some cases were fixed with a workaround but there's still some issue @olive dagger
hmm, I am not doing a sync anywhere. I am however, pulling a file with container.Directory(...).File(...)
Yeah the issue turned out to be more about service lifecycles and what happens after they decide to stop
(but it could be different from what you're seeing, I'm not sure)
In my case in the TUI it's showing that the service is still running. Why it's attached to a brand new container is what's problematic. As a workaround I may have to pass around the service container too and attach it to all subsequent containers in the same chain. I haven't tried that yet but sounds like the only workaround right now
I actually might be running into your issue with something else I'm working on. Do you have the same issue on 0.10.1?
I haven't checked 10.1.. I can test it
whoa! @fair wave , I am not seeing it in v0.10.1! So there is a regression in 0.10.2. Nice catch
Perfect ๐ I'll see if I can track it down further
trying to repro with
func (r *Repro) Run(ctx context.Context) (string, error) {
redis := dag.Container().
From("redis").
WithExposedPort(6379).
AsService()
cli := dag.Container().
From("redis").
WithoutEntrypoint().
WithServiceBinding("redis", redis)
ctrA := cli.WithExec([]string{"sh", "-c", "redis-cli -h redis info >> /tmp/out.txt"})
file := ctrA.Directory("/tmp").File("/out.txt")
ctrB := dag.Container().
From("alpine").
WithFile("/out.txt", file)
return ctrB.WithExec([]string{"cat", "/out.txt"}).Stdout(ctx)
}
and it's not working 
or rather, it is working and not breaking
my other thing repros fine but it's more complex and I was trying to come up with a simple repro. So the important bit must be missing here
Huh.. strange. I'll play around with this tomorrow and also try to repro it on my end.
I couldn't wait :).. I think I found the repro. If you don't do the Stdout(ctx) from ctrB (last line). It errors the same way
so basically returning a *Container from the function. I tried doing the Stdout(ctx) from the CLI and it worked. Weird thing is, I tried the same on my failing project and it didn't work ๐ฆ So there may be some other condition also.
Nice! I'm seeing the same
will bisect โค๏ธ thanks for the full repro โค๏ธ
yeah ok, it's definitely https://github.com/dagger/dagger/pull/6806
This is a spin-off of work on #6747, which has proven itself to be a fountain side-quests all over the codebase. Splitting that effort up into separate PRs for everyone's sanity. There will be ...
it's possibly related to my comment here: https://github.com/dagger/dagger/pull/6806#discussion_r1511158437
Yeah that sounds like a winner based on that discussion
paging @icy isle ๐
logs from the engine look quite sus as well:
time="2024-03-27T14:54:17Z" level=debug msg="removing server" client_call_digest= client_hostname=daggerdoer client_id=3w59c7g7b0x1ksxwmaun1wb18 server_id=c0ndgmejermst4edr1f9uzhjn spanID=6833802633a932ed traceID=59d67e65dd0d958679ce82e798683898
time="2024-03-27T14:54:17Z" level=debug msg="shutting down service i3f462iju6h70.roof2j4hq2i5q.dagger.local"
time="2024-03-27T14:54:17Z" level=debug msg="sending sigkill to process in container 1vclaw6ipy2i6lqm0vd43p276"
time="2024-03-27T14:54:17Z" level=debug msg="releasing cni network namespace vl5wnvo9krky14nne32bpjnpe"
dnsmasq[30]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 0 names
time="2024-03-27T14:54:18Z" level=error msg="failed to release containers" error="exit code: 137"
time="2024-03-27T14:54:18Z" level=error msg="failed to close server" client_call_digest= client_hostname=daggerdoer client_id=3w59c7g7b0x1ksxwmaun1wb18 error="rpc error: code = Unavailable desc = error reading from server: EOF\nclose unix /run/dagger/server-progrock-6ascje1al2g1nqlele4vuq3x6.sock: use of closed network connection" server_id=c0ndgmejermst4edr1f9uzhjn spanID=6833802633a932ed traceID=59d67e65dd0d958679ce82e798683898
ohhh i think i know what's going on.
because of this pr, the cli session and the module session are separate buildkit sessions - the service gets started in the module session, but then the module terminates, so buildkit tears down the service
but we expect the service to still be running (a fair assumption before this pr)
i think the trick is that we need to attach services to the top-level session? which is by definition the longest running one
Interesting, yeah that makes sense to me. I often have a service from one module that I need to re-use throughout my pipeline
actually hm, i'm not sure about that idea - while this is possible, the container target needs to be evaluated in the module context (so we get the secrets), but the gateway container should be run at the top-level
another option could just be to not tidy up any sessions that have running containers associated with them (we could try keeping those sessions alive until they all stop)
EDIT: this is actually what we're already doing 
hm, no actually it's not to do with lifetimes, it just looks like it's something to do with the search domains:
โ check 03ptb0jbfm7kq.8jkh3ttqhjhvk.dagger.local 6379/tcp 0.2s
โ polling for port 03ptb0jbfm7kq.8jkh3ttqhjhvk.dagger.local:6379
โ port not ready: dial tcp 10.87.0.19:6379: connect: connection refused; elapsed: 385.233ยตs
โ port is up at 10.87.0.19:6379
โ exec sh -c cat /etc/hosts && cat /out.txt 0.2s
โ host alias: lookup 03ptb0jbfm7kq on 10.87.0.1:53: no such host
โ lookup 03ptb0jbfm7kq.88lofm05gl3hc.dagger.local on 10.87.0.1:53: no such host
03ptb0jbfm7kq.8jkh3ttqhjhvk.dagger.local != ``03ptb0jbfm7kq.88lofm05gl3hc.dagger.local`
oh ๐ how does that happen??
huh. I don't understand why the container which has no attached service is trying to look up the attached service from another container.
yeah, something isn't making a ton of sense here - now i'm suddenly not sure about anything here, i don't really understand how this is supposed to work.
with the parent-id chaining thing we do for services, why is a parent able to access a service started in a child at all? why was this possible even before this refactor? (since the parent shouldn't have the child in the search domain at all)
cc @hexed falcon did we explicitly handle this case before?
will come back to this tomorrow ๐
probably related: https://github.com/dagger/dagger/pull/6914
tl;dr this used to work, but we changed services to be scoped per server, but didn't change the DNS to be scoped per server at the same time, which that PR fixes, but it's dependent on other work that Erik has in flight
definitely related