#Dagger stopped working, just hangs

1 messages · Page 1 of 1 (latest)

orchid nymph
#

I was running dagger yesterday, going through the examples and using the Node.js SDK. At one point I realized I was about to push a private container to the public repo (had ran parts of the example on my private project) and I CTRL-Ced out of it. From that point onwards, dagger became unresponsive and after a while this error would pop up:

Error: buildkit failed to respond: failed to list workers: Unavailable: connection error: desc = "error reading server preface: command [docker exec -i dagger-engine-23e74c13b0f5a5d1 buildctl dial-stdio] has exited with exit status 255, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=Error: can only create exec sessions on running containers: container state improper"

I am running this on a fedora 38 silverblue install and am using podman instead of docker. I imagine there is some cache/state somewhere that I should clear but I don't know where. Rebooting the system didn't help.

abstract stirrup
#

Deleting the dagger-engine-<HASH> container from Podman, clearing all volumes and re-running your pipeline should fix this.

#

When we have confirmation that this works as expected, I think that this would be a good one for the docs @bright ember

orchid nymph
#

running a podman system prune did help things move along but it isn't much better - instead of infinitely hanging the process now exits immediately and doesn't seem to do anything

#

sprinkling in some console.log statements, they do appear to stdout but looks like nothing container related gets executed

abstract stirrup
#

Which Node.js SDK version is this? Which example are you trying to run?

orchid nymph
#

Got a new error this time:

❯ Error: buildkit failed to respond: failed to list workers: Unavailable: connection error: desc = "error reading server preface: command [docker exec -i dagger-engine-23e74c13b0f5a5d1 buildctl dial-stdio] has exited with exit status 255, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=Error: can only create exec sessions on running containers: container state improper"

#

actually, looks to be the same

#

this is the code I'm trying:

import { connect } from "@dagger.io/dagger"

const NODE_IMAGE_TAG = "node:16.16.0"

connect(
  async (client) => {
    const nodeCache = client.cacheVolume("node")
    const source = client
      .container()
      .from(NODE_IMAGE_TAG)
      .withMountedDirectory(
        "/src",
        client.host().directory(".", {
          exclude: ["node_modules/", "scripts/", "/dist", "/releases"],
        }),
      )
      .withMountedCache("/src/node_modules", nodeCache)

    console.log("running npm install...")
    const runner = source.withWorkdir("/src").withExec(["npm", "install"])

    console.log("runner: ", runner)

    console.log("running tests")
    const test = runner
      .withEnvVariable("DAGGER_CI", "1")
      .withExec(["npm", "run", "dagger"])

    console.log("running build")
    const buildDir = source
      .withExec(["npm", "run", "build"])
      .directory("./build")
  },
  { LogOutput: process.stdout },
)
#

running the above pretty quickly outputs the following:

❯ node scripts/ci/test.mjs 
running npm install...
runner:  Container {
  _queryTree: [
    { operation: 'container', args: {} },
    { operation: 'from', args: [Object] },
    { operation: 'withMountedDirectory', args: [Object] },
    { operation: 'withMountedCache', args: [Object] },
    { operation: 'withWorkdir', args: [Object] },
    { operation: 'withExec', args: [Object] }
  ],
  clientHost: '127.0.0.1:38911',
  sessionToken: '348d6619-71cb-4330-8946-24bef2d9e762',
  client: GraphQLClient {
    url: 'http://127.0.0.1:38911/query',
    requestConfig: { headers: [Object] },
    rawRequest: [AsyncFunction (anonymous)]
  }
}
running tests
running build
#

there doesn't seem to be any actions done inside containers, like the API calls to dagger-engine immediately resolve

bright ember
#

maybe because your code isn't requesting any data at the end, so the query never resolves?

#

I tried your code with a minor modification at the end and it worked for me:

#

import { connect } from "@dagger.io/dagger"

const NODE_IMAGE_TAG = "node:16.16.0"

connect(
  async (client) => {
    const nodeCache = client.cacheVolume("node")
    const source = client
      .container()
      .from(NODE_IMAGE_TAG)
      .withMountedDirectory(
        "/src",
        client.host().directory(".", {
          exclude: ["node_modules/", "scripts/", "/dist", "/releases"],
        }),
      )
      .withMountedCache("/src/node_modules", nodeCache)

    console.log("running npm install...")
    const runner = source.withWorkdir("/src").withExec(["npm", "install"])

    console.log("runner: ", runner)

    console.log("running tests")
    const test = runner
      .withEnvVariable("DAGGER_CI", "1")
      .withExec(["npm", "run", "dagger"])

    console.log("running build")
    const buildDir = await source
      .withExec(["npm", "run", "build"])
      .directory("./build")
      .id()
  },
  { LogOutput: process.stdout },
)
bright ember
visual stratus
# orchid nymph I was running dagger yesterday, going through the examples and using the Node.js...

👋 How are your running Dagger Engine in podman?

One approach is symlink so when Dagger tries to invoke docker (default) it "just works": #1100004042207416404 message

ln -s `which podman` /usr/local/bin/docker
# modprobe iptable_nat #was neeeded by a RHEL 8 user

There is a successful way of doing this on Fedora Silverblue shown here:
https://gist.github.com/jpadams/789b259cb0cf7d2a166dc4f2fa588cc5

Gist

Thanks to @busla for sharing the original script, which I've modified a bit to use our registry alias - podman_dagger_engine.sh

kindred olive
#

that is, podman runs it as a rootless container and not as you

kindred olive
#

great, I deleted the first part of the message 😆

visual stratus
#

I saw it! It was there!

kindred olive
#

dagger (when run with docker) mounts /var/lib/dagger from the host to the container as you, but podman will always run the container rootless so for this to work with podman you need to create a volume and mount that to /var/lib/dagger in the container

this is not correct, see #1101085612750143568 message

#

you could probably map the same UID from the host to the container so dagger can write to the dir

#

@visual stratus I am curious what happens on a mac, are you able to run dagger with podman without mounting a volume?

orchid nymph
#

Yes, I have a symlink like that set up (and I must have done that earlier because I don't even remember, I def didn't do anything for dagger specificially)

#
❯ docker version
Client:       Podman Engine
Version:      4.5.0
API Version:  4.5.0
Go Version:   go1.20.2
Built:        Fri Apr 14 17:42:22 2023
OS/Arch:      linux/amd64
#

looking into that gist now

#

so does this mean dagger only supports docker officially right now?

kindred olive
#

although it would be more flexible if we could customize the mount dir, see this thread

#general message

bright ember
kindred olive
bright ember
#

Will wait to add more steps until we confirm if an additional mount is explicitly needed

visual stratus
kindred olive
visual stratus
kindred olive
#

ok, then its probably an ostree immutable thing, I will try to mount the dir to my home dir to test

visual stratus
#

On a MacBook Pro M1 (16MB RAM) with no Docker Desktop (uninstalled by dragging icon to trash), this works.

# ensure Docker really gone
sudo rm -f /usr/local/bin/docker
rm -rf ~/,docker/ # this need to be gone or get errors about docker-credential-desktop

brew install dagger && dagger version # I'm at v0.5.1
brew install go && go version # I'm at v1.20.3
brew install podman # I'm at v4.5.0
podman machine init
podman machine start # MacOS 12.3.1 kernel panic, 12.5.1+ ok
podman version # I'm at v4.5.0
sudo ln -s `which podman` /usr/local/bin/docker


git clone https://github.com/dagger/examples
cd examples/go/db-service
go run main.go
kindred olive
#

ok, from the debug logs when running the following

I am including suspicious logs only

unset _EXPERIMENTAL_DAGGER_RUNNER_HOST
dagger run go run main.go

...
time="2023-04-30T18:51:03Z" level=debug msg="could not read \"/var/lib/dagger/net/cni\" for cleanup: open /var/lib/dagger/net/cni: no such file or directory"
...
time="2023-04-30T18:51:05Z" level=debug msg="unsupported cache type , defaulting export off" spanID=25efbe540ad1db4f traceID=de138bc3b1310af1356226dfeab36f9f

#

dagger is able to mount /var/lib/dagger

✖  lt /var/lib/dagger
/var/lib/dagger
├── buildkitd.lock
├── cache.db
├── history.db
├── net
│  └── cni
├── otel-grpc.sock
└── runc-overlayfs
   ├── cachemounts
   ├── containerdmeta.db
   ├── content
   ├── executor
   ├── metadata_v2.db
   ├── snapshots
   └── workerid
#

this is good news!

#
podman inspect $(podman ps -q --filter name=dagger) | jq ".[].Config"

{
  "Hostname": "b5c207fe5820",
  "Domainname": "",
  "User": "",
  "AttachStdin": false,
  "AttachStdout": false,
  "AttachStderr": false,
  "Tty": false,
  "OpenStdin": false,
  "StdinOnce": false,
  "Env": [
    "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
    "TERM=xterm",
    "container=podman",
    "HOME=/root",
    "HOSTNAME=b5c207fe5820"
  ],
  "Cmd": [
    "--debug"
  ],
  "Image": "registry.dagger.io/engine:v0.5.0",
  "Volumes": null,
  "WorkingDir": "/",
  "Entrypoint": "dagger-entrypoint.sh",
  "OnBuild": null,
  "Labels": null,
  "Annotations": {
    "io.container.manager": "libpod",
    "io.podman.annotations.privileged": "TRUE",
    "org.opencontainers.image.stopSignal": "15"
  },
  "StopSignal": 15,
  "HealthcheckOnFailureAction": "none",
  "CreateCommand": [
    "docker",
    "run",
    "--name",
    "dagger-engine-23e74c13b0f5a5d1",
    "-d",
    "--restart",
    "always",
    "-e",
    "_EXPERIMENTAL_DAGGER_CACHE_CONFIG",
    "-e",
    "_EXPERIMENTAL_DAGGER_SERVICES_DNS",
    "-v",
    "/var/lib/dagger",
    "--privileged",
    "registry.dagger.io/engine:v0.5.0",
    "--debug"
  ],
  "Umask": "0022",
  "Timeout": 0,
  "StopTimeout": 10,
  "Passwd": true,
  "sdNotifyMode": "container"
}
#

hmm wait, these are old 🤷‍♂️

ls -al /var/lib/dagger/
drwxr-xr-x@   - levy 25 Mar 21:39 .
drwxr-xr-x@   - root 25 Mar 20:44 ..
.rw-------@   0 levy 25 Mar 21:39 buildkitd.lock
.rw-------@ 33k levy 25 Mar 21:39 cache.db
.rw-------@ 33k levy 25 Mar 21:46 history.db
drwx------@   - levy 25 Mar 21:39 net
srw-rw-rw-@   0 levy 25 Mar 21:39 otel-grpc.sock
drwx------@   - levy 25 Mar 21:39 runc-overlayfs
#
rm -rf /var/lib/dagger
dagger run go run main.go

runs fine, but doesn't create the directory

ls -al /var/lib/dagger
"/var/lib/dagger": No such file or directory (os error 2)
kindred olive
#

I will do some more experiments tonight, but curious why /var/lib/dagger is empty

bright ember
north trench
#

👋 @kindred olive! it's doesn't create a directory in the host because it's not effectively bind mounting anything, if you check the -v flag, it's only supplying one volume to that (no colon character). So what -v /var/lib/dagger effectively does, is creating an anonymous volume inside the engine container so things that are written there are not reflected in the CoW layers.

#

@orchid nymph gentle ping to see if you were able to get passed this

kindred olive
#

one question: Does Dagger not store the cache layers on the host?

Are they all stored in anonymous volumes that all spawned runner containers have access to?

north trench
#

one question: Does Dagger not store the cache layers on the host?

it stores them in the host, but under an anonymous docker volume.

Are they all stored in anonymous volumes that all spawned runner containers have access to?

WDYM by runner containers? The actual pipeline containers? The only container that has access to the cache layers is the engine containers. All the other containers spawn from there during the build don't need this since it's the engine that orchestrates everything

kindred olive
#

right, sry, I wasnt very clear.

Before a new run using an SDK (Go in my example) a new dagger-engine-someuniqueid is started that can use cached data from previous runs. But I got confused by thinking it was bind mounting data generated in-container to /var/lib/docker.

What I think I was dealing with originally goes all the way back to the stone-ages, or v0.3.4. New engines were spawned differently then (without volumes maybe?) which I solved by mounting a volume. And since that just-worked™️ I just stuck with it 😄

https://github.com/dagger/dagger/blob/v0.3.4/internal/engine/docker.go#L49-L55

GitHub

A programmable CI/CD engine that runs your pipelines in containers - dagger/docker.go at v0.3.4 · dagger/dagger

north trench