#Using WithServiceBinding with long build times with exposed ports causes health-check to fail

1 messages ยท Page 1 of 1 (latest)

patent slate
#

My dagger ci runs well on my local machine, but fails on the ci-runner, it is using the same docker-version, 24.
I've already enabled debug-logging which does not seem to provide much more information on the runner log, maybe it's because of the limited tty provided.

Each run also creates a new session:Creating new Engine session... OK!

when I execute the steps using .sync, to just build parts of it to later use in serviceBinding:

health check errored: exit code: 1

without prior sync and straight bind:

service exited before healthcheck

When using bind I ensure to have a exec(nil), shouldn't be needed but just to be safe until it works.

Are there any timeouts that might need adjusting if a single WithExec Command takes too long? ContainerWithExecOpts does not seem to have anything along that line.

Thanks for all the help!

dapper trail
#

๐Ÿ‘‹ @patent slate what CI runner are you using?

#

Are there any timeouts that might need adjusting if a single WithExec Command takes too long? ContainerWithExecOpts does not seem to have anything along that line.

no, there aren't.

seems like you're having issues with the service healthchecks somehow ๐Ÿค”

patent slate
#

I am building it from Dockerfiles, am not setting any Healthchecks, during ci.

#

Runner is gitlab-runner 16.1.0
docker executor with 24 dind

i'd have to examine the env. as I am using shared-runners on our infrastructure.

dapper trail
#

healthchecks are a native Dagger intenral thing. When you use WithServiceBinding that will attempt to healthcheck your service against the WithExposedPorts to validate that the service is running

patent slate
#

ah yes, i saw that while browsing the sourcecode

#

what I am sure of that our shared-runners are under higher load than my machine. ๐Ÿ™‚

#

What I did attempt, though, was building the Container using .sync so it is not yet bound to another container, but it also failed.

#

when I execute the steps using .sync, to just build parts of it to later use in serviceBinding:
health check errored: exit code: 1

#

but - i already have other services bound to it, which I need during that step

#

I try again with the exposedPort moved way to the end, maybe that helps ๐Ÿคž

dapper trail
#

service exited before healthcheck this message generally happens when you use WithServiceBinding. Seems like for some reason the the service container is not being able to start successfuly

#

could you run your pipeline with dagger run --debug to see if that shows a bit more info?

patent slate
#

i tried running the pipeline with

export DAGGER_LOG_FORMAT='plain'
export DAGGER_LOG_LEVEL='debug'

but it won't show up in the output

#

sadly i do not have ssh-access on those runners

#

locally i can not reproduce it

#

i could install the dagger cli to execute the pipeline

#

done - will report back once the debug run is done

patent slate
patent slate
#

this is the end of the log, after it failed, with debug.

#

2 things i am checking

  • if the mail service port 80 works
  • split the built of the container image with the db initialization

this should bring it within the <14min~ range

patent slate
#

looks good now

dapper trail
#

@patent slate seems like your maildev initialization was failing?

154: > in service tfln0u7ftr2oa.ff0a4sk5ooieo.dagger.local > service emd9qlu317jj4.ff0a4sk5ooieo.dagger.local > service mm9enik7h6jfg.ff0a4sk5ooieo.dagger.local
exit status 1```
patent slate
#

I could revert the other changes if you want to know

#

i actually will revert back some of the "attempts", but I assume that was the issue

#

Jup, was just the maildev service that was not starting it's service on the correct port.

I learned a lot on how to debug this, thanks for the assistance again!

dapper trail
#

happy you got unblocked

#

I'm wondering how come it was working locally but not in your CI

patent slate
#

i've created a simple test to reproduce the behavior.

my expectatoin would be that the container building has infinite time, unless configured elsewhere, but the "building", in the code here sleeping, service is getting killed and the pipeline fails.

package main

import (
    "context"
    "dagger.io/dagger"
    "log"
    "os"
)

func main() {

    ctx := context.Background()

    client, err := dagger.Connect(ctx, dagger.WithLogOutput(os.Stderr))
    if err != nil {
        log.Fatalln(err)
    }
    defer func(client *dagger.Client) {
        err := client.Close()
        if err != nil {
            log.Println(err)
        }
    }(client)

    cntSleeper := client.Container().From("alpine").
        WithExec([]string{"sleep", "1d"}).
        WithExposedPort(80)

    _, _ = client.Container().From("alpine").
        WithServiceBinding("sleeper", cntSleeper).
        WithExec([]string{"sleep", "1d"}).
        Stdout(ctx)
}
restive dock
#

That's expected because you're not opening port 80 on cntSleeper. In other words, you're declaring there's an open port 80 that should be healthchecked, but you're not actually opening it.

dapper trail
#

^ this. It's strange that this was causing issues in your CI

#

particularly since the healthcheck timeout is particularly generous

patent slate
#

I am aware, but it would still fail if i would do so after the sleep within the service container.

We are building a lot of php deps we need at specific versions, and then seeding the database, which makes it hit the limit on slower, more occupied, runners.

I will have to split the building of the container image to produce a separate artifact to be used during ci to improve this and during seeding run it without being a service using sync and then bind it.

Workable, but not as expected, since the expose comes after the exec's, but that is my misunderstanding.

patent slate
#

Using WithServiceBinding with long build times with exposed ports causes health-check to fail

dapper trail
#

@patent slate so IIUC your service container takes more than the default max timeout service healthcheck (15m) so it fails?

#

so, yes, the workaround is splitting the service container step so run your time consuming step before calling WithExposedPort.. something like this.

package main

import (
    "context"
    "log"
    "os"

    "dagger.io/dagger"
)

func main() {
    ctx := context.Background()

    client, err := dagger.Connect(ctx, dagger.WithLogOutput(os.Stderr))
    if err != nil {
        log.Fatalln(err)
    }
    defer func(client *dagger.Client) {
        err := client.Close()
        if err != nil {
            log.Println(err)
        }
    }(client)

    // prepare the service container here
    svcPrep, err := client.Container().From("alpine").
        WithExec([]string{"sleep", "16m"}).Sync(ctx)
    if err != nil {
        panic(err)
    }

    svc := svcPrep.WithExec([]string{"sleep", "1"}).
        WithExposedPort(80)

    _, _ = client.Container().From("alpine").
        WithServiceBinding("sleeper", svc).
        WithExec([]string{"sleep", "1d"}).
        Stdout(ctx)
}


patent slate
#

Yep, that will be my solution.