#works in local breaks in CI

1 messages · Page 1 of 1 (latest)

rotund fjord
#

I am having a classic "works in local, breaks in CI" situation. I am trying to mount the docker daemon to a container. I created a repro.

package main

import (
    "dagger/testdaemon/internal/dagger"
)

type Testdaemon struct{}

// Returns a container that echoes whatever string argument is provided
func (m *Testdaemon) Daemon() *dagger.Container {
    dockerdArgs := []string{
        "dockerd",
        "--log-level=error",
        "--host=tcp://0.0.0.0:2375",
        "--tls=false",
    }
    dockerPort := 2375
    daemon := dag.Container().From("docker:28-dind").
        WithExposedPort(dockerPort).
        WithEnvVariable("TINI_SUBREAPER", "true").
        AsService(dagger.ContainerAsServiceOpts{
            Args:                     dockerdArgs,
            InsecureRootCapabilities: true,
        })

    cli := dag.Container().From("docker:cli").
        WithServiceBinding("dockerd", daemon).
        WithEnvVariable("DOCKER_HOST", "dockerd:2375").
        WithExec([]string{"docker", "version"}).
        WithExec([]string{"docker", "ps"})

    return cli
}
#

Here's what I'm seeing in jenkins

#

I'm guessing that < is because the api is returning a <nil> or something simiar?

#

it's working totally fine on my local machine. This is dagger v0.16.2. I've previously been able to mount the daemon I am out of ideas on how to debug it :/

rotund fjord
#

that error about the fuse-overlayfs is suspect.

#

I'm trying again with apk add fuse-overlayfs

deft pollen
#

Yeah I'm not sure about that error w/ <, one possibility is new images got pushed for the tags you're using and something broke?

We successfully run docker-in-dagger for some tests: https://github.com/sipsma/dagger/blob/6d97b8d567a8c3dabf3933d03d8912765815bb9c/core/integration/provision_test.go#L177-L177

A couple of random things to try:

  1. You want a volume at the point where dockerd stores data, otherwise it can't use overlay and is extremely slow (maybe even errors out in some cases like you're seeing?) It would be like:
daemon := dag.Container().From("docker:28-dind").
        WithExposedPort(dockerPort).
        WithEnvVariable("TINI_SUBREAPER", "true").
        WithMountedCache("/var/lib/docker", dag.CacheVolume("dockerd", dagger.ContainerWithMountedCacheOpts{Sharing: dagger.CacheSharingModePrivate}).
        AsService(dagger.ContainerAsServiceOpts{
            Args:                     dockerdArgs,
            InsecureRootCapabilities: true,
        })
  1. Could set error level in the daemon to debug instead of just error

  2. Could try to use a tag for the docker cli that's closer to 28 (it looks like it's much newer and downgrading to a way older api)

deft pollen
rotund fjord
#

good suggestions! adding fuse-overlayfs got me further

#

let me try your suggestions.

#

I actually have that mounted cache at /var/lib/docker in my code. I didn't add it to the repro. let's see, I switched to docker:28-cli

#

hmm still getting the same. I am back to getting that weird < error

func (m *Testdaemon) Daemon(ctx context.Context) (*dagger.Container, error) {
    dockerdArgs := []string{
        "dockerd",
        "--log-level=info",
        "--host=tcp://0.0.0.0:2375",
        "--tls=false",
    }
    dockerPort := 2375
    daemon := dag.Container().From("fcr.fmr.com/docker:28-dind").
        WithExposedPort(dockerPort).
        WithEnvVariable("TINI_SUBREAPER", "true").
        WithMountedCache("/var/lib/docker", dag.CacheVolume("docker-lib"),
            dagger.ContainerWithMountedCacheOpts{
                Sharing: dagger.CacheSharingModePrivate,
            }).
        AsService(dagger.ContainerAsServiceOpts{
            Args:                     dockerdArgs,
            InsecureRootCapabilities: true,
        })

    endpoint, err := daemon.Endpoint(ctx, dagger.ServiceEndpointOpts{
        Scheme: "tcp",
    })
    if err != nil {
        return nil, err
    }

    cli := dag.Container().From("fcr.fmr.com/docker:28-cli").
        WithServiceBinding("dockerd", daemon).
        WithEnvVariable("DOCKER_HOST", endpoint).
        WithExec([]string{"docker", "version"}).
        WithExec([]string{"docker", "ps"})

    return cli, nil
}

Updated code.

#

I even switched to grabbing the endpoint from the service as this indicates that the DOCKER_HOST is malformed?

#

what's puzzling is, this works fine in my local mac

deft pollen
# rotund fjord what's puzzling is, this works fine in my local mac

There are host-specific things that can break nested docker, we for instance in our CI have to run it on a host that supports cgroupsv2. But this worked in your infra before, right? Is it possible something changed there, like an os upgrade or similar? Even so, I would expect dockerd itself to report more errors then..

rotund fjord
#

yeah that's the thing. dockerd is up, dagger also reports that the port is healthy and listening.

#

let me try switching the client image

rotund fjord
#

after meddling with this all night I finally figured it out. For some reason, the docker client tries to reach out to the internet to get to DOCKER_HOST even though it's local. So our proxy just blocks it. I was using service.Endpoint to get the endpoint and that's a random string. The docker client gets blocked when it tries to get to an address like this http://akv1ku3jcke0o:2378/v1.47/info. The strange thing is, I tried intentionally adding a wrong proxy and the client gave me a definite time out because of proxy error. But I didn't see that in my testing without the proxies.

I had to use a defined alias in WithServiceBinding and then add that to my NO_PROXY to get it to work. I think this needs to be documented (maybe in the cookbooks?) so that others don't have to spend the time I did. I can open an issue.

#

it's probably worth a cookbook entry anyway, since it also kinda needs that /var/lib/docker mount to be effective

rotund fjord
#

I'll create a PR if I get time over the weekend or next week 🙂