#intermittent "no such host" with AsService

1 messages Β· Page 1 of 1 (latest)

warm pagoda
#

My pipeline initializes a database (elasticsearch) and has several different containers import data into that db. Once the importers finished, I can export from the cache.

This works great! 1/2 the time...

The other 1/2 the time I hit a no such host from one of the import containers. I've just been re-running the job until it succeeds, but some of the individual importers are many hours long, so it's a painful debug loop.

Is there something more I need to do to make the service reliably accessible?

The full code is here: https://github.com/headwaymaps/headway/blob/fc9e62b5bdbd75c814901b49c4b37b0d79baae66/dagger/main.go#L596

But the synopsis is here:

elasticsearchService := dag.Container().
  From("pelias/elasticsearch:8.12.2-beta").
  WithMountedCache("/usr/share/elasticsearch/data", elasticsearchCache, opts).
  WithExposedPort(9200).
  AsService()

importerContainer1.WithServiceBinding("pelias-elasticsearch", elasticsearchService).
  WithExec("import-some-stuff").
  Sync(ctx)

importerContainer2.WithServiceBinding("pelias-elasticsearch", elasticsearchService).
  WithExec("import-some-other-stuff").
  Sync(ctx)
...
lunar ether
warm pagoda
#

bin/build builds/seattle is my entry point.

But if you don't want to run arbitrary scripts I think this should hit it:
dagger -c "new | with-area Seattle | pelias | elasticsearch-data"

lunar ether
#

I see your Headway type has a new method but it's not a function. If you remove the method receiver, that will make it a proper dagger constructor

warm pagoda
#

Thanks for those tips! I’m learning golang only in as much as I use it for dagger. πŸ™ˆ

lunar ether
warm pagoda
#

hmmm… I’m not sure about that. Could it be a runtime thing? I’m using orbstack

lunar ether
#
 --------------
β”‚ ┃  create index
β”‚ ┃ --------------
β”‚ ┃
β”‚ ┃ [resource_already_exists_exception] index [pelias/KIavaoNLTTuVVe40X76m3g] already exists, with { ind
β”‚ ┃ ex_uuid="KIavaoNLTTuVVe40X76m3g" & index="pelias" }

any I chance I have to make the command idemponent?

warm pagoda
#

Your best bet would be to rev the CacheKey

lunar ether
warm pagoda
#

sorry that it’s janky :/

lunar ether
#

np πŸ™

#

@warm pagoda not sure what orbstack is doing but running ./bin/download against the source openaddresses image fails also with docker

2025-09-17T19:44:17.546Z - info: [openaddresses-download] Attempting to download all data
2025-09-17T19:44:17.548Z - error: [openaddresses-download] error making directory /mnt/pelias/openaddresses message=EACCES: permission denied, mkdir '/mnt/pelias', stack=Error: EACCES: permission denied, mkdir '/mnt/pelias', errno=-13, code=EACCES, syscall=mkdir, path=/mnt/pelias
2025-09-17T19:44:17.548Z - error: [openaddresses-download] Failed to download data message=EACCES: permission denied, mkdir '/mnt/pelias', stack=Error: EACCES: permission denied, mkdir '/mnt/pelias', errno=-13, code=EACCES, syscall=mkdir, path=/mnt/pelias
lunar ether
warm pagoda
#

I'm blowing away my docker data to see if I can reproduce...

#

But you're right... the /mnt/pelias perms seem weird/wrong

#

Hmmm... I'm still mid-way, but I re-ran: dagger -c "new | with-area Seattle | pelias | elasticsearch-data"

...and got past the openaddresses download part without issue. I wonder what's different about our setups. This is on debian+docker

docker --version
Docker version 28.4.0, build d8eb465
β”œβ”€β— Container.from(address: "pelias/openaddresses:master"): Container! 1.3s                                                
β”œβ”€β— .withMountedDirectory(
β”‚   ┆ path: "/pelias-service"  
β”‚   ┆ source: Directory.directory(path: "pelias"): Directory!                                  
β”‚   ): Container! 0.0s
β”œβ”€β— .withFile(
β”‚   ┆ path: "/code/pelias.json"    
β”‚   ┆ source: Container.file(path: "pelias.json"): File!                             
β”‚   ): Container! 0.2s
β”œβ”€β–Ά .withExec(args: ["./bin/download"]): Container! 5.7s
β”œβ”€β— .directory(path: "/data/openaddresses"): Directory! 18.3s
#

docker run --rm -ti pelias/openaddresses:master ./bin/download

I think part of the issue with this permissions denied error, in practice, the authors of the pelias/openaddresses container expect a config at /code/pelias.json (like I've done) which might affect the output directory. Kind of frustrating that there aren't more useful defaults in their container, but πŸ€·β€β™‚οΈ

Let me see if I can put together a less complicated example to reproduce the problem.

GitHub

Self-hostable maps stack, powered by OpenStreetMap. - headwaymaps/headway

lunar ether
#

you should be able to repro the same error I got by running the ad-hoc docker run command

warm pagoda
#

Yes I did repro the error. That error isn't interesting to me though. I'm currently working on a simpler example to see if I can reproduce. If I can, I'll come back to you so that you don't have to wade through all my incidental complexity.

lunar ether
warm pagoda
#

By jove, I think I'm onto something...

new branch: https://github.com/headwaymaps/headway/tree/mkirk/dagger-service-err-repro
new command: dagger --progress=plain -c "new | with-area Seattle | pelias | test-service"

It took me a while to get it to repro!

This is mostly speculation, but it seems like part of getting it to repro was having a long enough pause where my pipeline has no containers using the service for a while. As evidence, I noticed these lines in the output during one of these pauses:

43 : ┆ Container.asService DONE [1m31s]
43 : ┆ [1m31s] | {"@timestamp":"2025-09-17T21:05:05.340Z", "log.level": "INFO", "message":"stopping ...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch-shutdown"
,"log.logger":"org.elasticsearch.node.Node","elasticsearch.cluster.uuid":"rWSe5E5bRq2B8SFG0MjN3g","elasticsearch.node.id":"fddN3tR9Rj-1fmGjsur1VA","elasticsearch.node.name":"0aaqtju2fs8o6.9vhf36081m9ua.dagger.local","elasticsearch.cluster.
name":"pelias-dev"}

And then when we get to the next container using this service I get a similar error:

lookup 0aaqtju2fs8o6.hio1motrfspe2.9vhf36081m9ua.dagger.local on 10.87.0.1:53: no such host
GitHub

Self-hostable maps stack, powered by OpenStreetMap. - GitHub - headwaymaps/headway at mkirk/dagger-service-err-repro

#

Maybe there's some kind of logic where a service automatically starts when it's bound, but then shuts itself down after N seconds of being unbound

But then what happens when it gets bound again after such a shutdown? I'm going to tinker around with manually starting the service to see if that does anything for me.

lunar ether
#

This was my initial suspicion when you first brought the problem, just wanted to make sure something else wasn't happening

#

The service should be brought up automatically when used afterwards. Maybe we have an edge case there. Now that I know where to look, I'll give it a try tomorrow

warm pagoda
#

With my test case, when I manually start the service before running any of the "import" containers, the problem goes away.

Similarly, I've added manual service Start/Stop to my actual code. I tried once just now and it succeeded. πŸ‘

Next I'm going to try it with a much larger job... I'll let you know in a couple days if I have trouble πŸ˜…

I think the AsServer automatic shutdown seems plausible as the source of my trouble. There are pauses between the various client/server steps in the pipeline where other work is done (like downloading the openaddresses stuff), before returning to more client/server work. This non-server work could take a while depending on network speeds or could be very fast if it'd been cached, which is why it would sometimes succeed.

Whether this indicates some problem with automatic server restarts is another question. Obviously I was expecting it to magically work how I had it, but I don't know enough to know if that's my problem or daggers.

lunar ether
#

will keep you posted about the findings

lunar ether
#

ok, was able to repro

39  : HeadwayPelias.testService ERROR [2m45s]
39  : ! process "/runtime" did not complete successfully: exit code: 2
39  : [2m45s] | panic: failed to import 2: input: container.from.withMountedDirectory.withFile.withExec.withServiceBinding.withExec.withExec.stdout process "sh -c sleep 65 && ping -c 1 pelias-elasticsearch && echo importer-2" did not complete successfully: lookup dpk5q0evfkofa for hosts file: lookup dpk5q0evfkofa on 10.87.0.1:53: no such host
39  : [2m45s] |         lookup dpk5q0evfkofa.nhb62n08ibrga.hbrpkc2n7h09c.dagger.local on 10.87.0.1:53: no such host
39  : [2m45s] |         lookup dpk5q0evfkofa.hbrpkc2n7h09c.dagger.local on 10.87.0.1:53: no such host
39  : [2m45s] |
39  : [2m45s] | goroutine 1 [running]:
39  : [2m45s] | main.(*Pelias).TestService(0xc000115900, {0xdbf2a0, 0xc0001dcd80})
#

investigating now

lunar ether
#

ok.. managed to get a simpler repro. Fixing now

lunar ether
lunar ether
warm pagoda
#

Thank you for investigating! I don't really understand the change (but that's OK, I'm new).

My manual start/stop has been working for me in the meanwhile, but I can remove the manual start/stop and retest once the next release is out.

prime igloo
#

Great that this is fixed. Do you know which versions are affected @lunar ether?

lunar ether
prime igloo
#

I am seeing this

! process "go mod download" did not complete successfully: lookup sh7daoi6dsoqk for hosts file: lookup sh7daoi6dsoqk on 10.87.0.1:53: no such host
    lookup sh7daoi6dsoqk.ckbfbrh6as3am.ha3magi3r0ube.dagger.local on 10.87.0.1:53: no such host
    lookup sh7daoi6dsoqk.ha3magi3r0ube.dagger.local on 10.87.0.1:53: no such host

Engine: v0.19.0
cli: v0.19.0

Is this a different issue. It is not happening consistently.

lunar ether