#Run Dagger on Kubernetes | Dagger

1 messages · Page 1 of 1 (latest)

prisma leaf
#

I assume that the runner I'm running would need to have kubectl installed as well as the appropriate kubeconfig to talk to the k8s API? Definitely feels like the docs are missing a step or two to get this wired up properly.

empty turret
#

Hey @prisma leaf!! Are you setting up your runners on kubernetes and the dagger engine as a DaemonSet? Meaning, will runners connect to a container pod on the same host?

prisma leaf
#

Yes, I believe I have all that set up properly. The piece I'm not sure I have set up properly is Persistent Volume with the local cache, but... I figured if I could get some dagger jobs running with engine on the node that would learn pretty quick if that PV was setup improperly.

#

But I guess the question is... If the engine is running on the same node, shouldn't I just be able to point the gitlab runner to a specific port on the node?

empty turret
#

That is correct. Not a port directly but rather a uds. You can mount the unix socket and then set the environment variable that the Dagger CLI uses to connect to the engine. Quick example we have here for github runners:
The pod that will run the dagger CLI has the volume mount:

- name: varrundagger
  mountPath: /var/run/buildkit

And the env variable:

env:
- name: _EXPERIMENTAL_DAGGER_RUNNER_HOST
  value: unix:///var/run/buildkit/buildkitd.sock

This works because the dagger helm chart deploys the daemonset and volume mounts from the host the /var/run/buildkit directory

prisma leaf
#

This is what I get on the describe for my dagger pod

#

So... I imagine I just need to alter what you gave me a bit for my varrundagger?

empty turret
#

The daemonset pod looks correct. What I was referring to is the runner pod that needs the changes 👍

prisma leaf
#

Hmmm, so I need to mount that path on the Gitlab runners is what you're saying?

empty turret
#

Exactly! On the Gitlab runner you are going to run the Dagger CLI. The CLI needs to connect to the engine via a unix domain socket (there are other ways to connect to the engine, but for your use case this one seems the most appropriate), that is why you mount the socket the engine listens on directly on the runner pod

We'll review the docs and make some updates to make it more clear!

prisma leaf
#

My gitlab job...

  extends: [.dagger]
  variables:
    _EXPERIMENTAL_DAGGER_RUNNER_HOST: "unix:////var/run/dagger/dagger.sock"
  rules:
    - if: $TRIGGER_ACTION == 'dagger-help'
  script:
    - dagger --help

I'll attempt to figure out how to get that mount path established on the gitlab runner. Wish me luck. Hah.

empty turret
#

I think I messed up a bit my explanation. When I mean the runner pod, I'm referring to the actual gitlab runner, not the workflow itself. Are you deploying your own runners?

prisma leaf
#

Yeah, I am deploying my own runners. So I need to figure how to modify the helm chart for the gitlab runners to mount that path.

empty turret
prisma leaf
#

Ahh great! I was doing something else, and this looks like the right answer. Once I have this set up, how would I verify that the Dagger CLI is talking to the enginer properly?

arctic cave
#

The k8s guide has a dagger command you can use.

prisma leaf
#

Hmm the gitlab runner isn't happy with that mount. I have a feeling something isn't setup properly with my persistent volume.

empty turret
#

You are missing the volumes section:

volumes:
- name: varrundagger
  hostPath:
    path: /var/run/dagger
empty turret
prisma leaf
#

The volume addition seemed to make the pod happy. I missed the command from the guide. My bad. I'll try that now.

#

So close, but so far it seems. Seems like it's hanging on starting the engine.

#

This is how I am running it.

empty turret
#

The env variable is incorrect, I did not pick it up before. it's supposed to be:

env:
- name: _EXPERIMENTAL_DAGGER_RUNNER_HOST
  value: unix:///var/run/buildkit/buildkitd.sock

Remember that if you configure on the runner pod it is not necessary to configure it on the job!

#

Wait, I'm messing something up myself

#

Give me a sec haha. reviewing again the history of messages, got a bit lost with another case I was looking into

#

Just checked. I think we are looking good. Quick review:

  • Dagger helm chart does a host mount of /var/run/buildkit
  • Runner pod lists the volume and mounts it on /var/run/buildkit
  • Runner pod exposes env variable of _EXPERIMENTAL_DAGGER_RUNNER_HOST to point to the socket found on the host at /var/run/buildkit/buildkitd.sock
#

Try fixing the env variable to have the correct socket at /var/run/buildkit. Make sure that the runner pod mounts that volume as well /var/run/buildkit not /var/run/dagger

prisma leaf
#

I'll try getting that setup going. It's a little confusing given the pod template for the engine. ha

empty turret
#

My bad. I confused the host path and the path we mount it at the engine itself. The engine grabs the host path /var/run/dagger and mounts it into /var/run/buildkit in the engine container. Lets review your runner setup once more, how did you define the volumes and volumeMounts there?

prisma leaf
#

This is how I have my mounts set up for my Gitlab Runner.

#

The actual CI job.

#

I am still getting hangs given this setup.

#

I believe setting up the volume mounts like described in the doc are "correct" , but... still having issues getting everything connected

empty turret
#

Okay. I made a mistake when I wrote the volumes definition above. In your gitlab runner, the volumeMounts is correct. The volumes is incorrect. We need to mount the host path /var/run/dagger into the cntainer path buildkit. We could put any path or names we want here, they don't have to be those, the important thing is that the components look in the right places.
THe volumes should be

volumes:
- name: varrundagger
  hostPath:
    path: /var/run/dagger

The volume mount you should leave it as is. And the environment variable should be pointing to unix:///var/run/buildkit/buildkitd.sock

#

The connections that happen here are:

  1. Dagger engine mounts it's local /var/run/buildkit to the hosts /var/run/dagger
  2. The gitlab runner mounts its local /var/run/buildkit to the hosts /var/run/dagger
  3. The gitlab runner exposes the _EXPERIMENTAL_DAGGER_RUNNER_HOST to connect to the unix socket that was mounted on /var/run/buildkit which is named buildkitd.sock and connects the gitlab runner to the dagger engine container
#

Basically all we have to do is connect the socket created by the dagger engine to the gitlab runner so that the CLI can talk directly to it

prisma leaf
#

So I think I've weaved all that in for my runners. Whenever my gitlab runner pods spin up to grab these jobs, they end up exiting with code 255, and Im not quite sure yet but I'm sure it's related to mouting issues.

#

The 255 might be totally unrealted, but I'm seeing startin engine hang

empty turret
#

Is it possible for you to deploy a gitlab runner pod and exec into it? So that we can debug if the socket is correctly mounted

prisma leaf
#

Git lab runners are setup as a "Deployment" and each job gets it's own pod set up for the run. I can exec and get a shell on the container running the deployment, but the actual pods where the jobs run I can't /bin/bash exec into them

empty turret
#

mmm I see. I was not aware of that architecture, I was assuming it worked like github runners. What if you get the Pod specification of the job that was launched? with that you should be able to spawn a pod and exec into it

prisma leaf
#
Defaulted container "build" out of: build, helper, init-permissions (init)
cache  empty  lib    local  lock   log    mail   opt    run    spool  tmp

I can't get an actual shell for the running pod where the "starting engine" is stalled, but I can run commands on it.

empty turret
#

Okay. Lets start first by checking the pod specification itself that was created. Can you confirm that the pod spec has the correct mounts?

prisma leaf
#

Seems like it? Based on how I mounted the volumes on the runner

#

Looks like it's mounted properly?

#

I don't have it running in that namespace. I could try changing that

empty turret
#

Namespace shouldn't be a problem. As long as the pod of the DaemonSet is running on the same node than the pod of the runner job it should work

empty turret
prisma leaf
#

Workin on it

#

Looks like the dir is empty

#

The unix socket is definitely there on the dagger engine

#

Yeah, I don't know why the socket isn't getting mount there. Everywhere else I try to mount it, it seems to be working just fine. But inside the runner it has issues.

#

Ugh... I think I may have figured it out. I'll report back here in a few....

#

Yup.... I had to setup my runners to use affinity node selectors to run on the proper node. I thought I chekced this earlier. But this config seems to have done the trick.

#

@empty turret Thanks a ton for all the help with this today. Im sure I derailed your day a bit, but I appreicate the help