#Run Dagger on Kubernetes | Dagger
1 messages · Page 1 of 1 (latest)
I assume that the runner I'm running would need to have kubectl installed as well as the appropriate kubeconfig to talk to the k8s API? Definitely feels like the docs are missing a step or two to get this wired up properly.
Hey @prisma leaf!! Are you setting up your runners on kubernetes and the dagger engine as a DaemonSet? Meaning, will runners connect to a container pod on the same host?
Yes, I believe I have all that set up properly. The piece I'm not sure I have set up properly is Persistent Volume with the local cache, but... I figured if I could get some dagger jobs running with engine on the node that would learn pretty quick if that PV was setup improperly.
But I guess the question is... If the engine is running on the same node, shouldn't I just be able to point the gitlab runner to a specific port on the node?
That is correct. Not a port directly but rather a uds. You can mount the unix socket and then set the environment variable that the Dagger CLI uses to connect to the engine. Quick example we have here for github runners:
The pod that will run the dagger CLI has the volume mount:
- name: varrundagger
mountPath: /var/run/buildkit
And the env variable:
env:
- name: _EXPERIMENTAL_DAGGER_RUNNER_HOST
value: unix:///var/run/buildkit/buildkitd.sock
This works because the dagger helm chart deploys the daemonset and volume mounts from the host the /var/run/buildkit directory
This is what I get on the describe for my dagger pod
So... I imagine I just need to alter what you gave me a bit for my varrundagger?
The daemonset pod looks correct. What I was referring to is the runner pod that needs the changes 👍
Hmmm, so I need to mount that path on the Gitlab runners is what you're saying?
Exactly! On the Gitlab runner you are going to run the Dagger CLI. The CLI needs to connect to the engine via a unix domain socket (there are other ways to connect to the engine, but for your use case this one seems the most appropriate), that is why you mount the socket the engine listens on directly on the runner pod
We'll review the docs and make some updates to make it more clear!
My gitlab job...
extends: [.dagger]
variables:
_EXPERIMENTAL_DAGGER_RUNNER_HOST: "unix:////var/run/dagger/dagger.sock"
rules:
- if: $TRIGGER_ACTION == 'dagger-help'
script:
- dagger --help
I'll attempt to figure out how to get that mount path established on the gitlab runner. Wish me luck. Hah.
I think I messed up a bit my explanation. When I mean the runner pod, I'm referring to the actual gitlab runner, not the workflow itself. Are you deploying your own runners?
Yeah, I am deploying my own runners. So I need to figure how to modify the helm chart for the gitlab runners to mount that path.
No problem, if you are using the gitlab provided helm chart, you can configure volume and volumeMounts right here: https://gitlab.com/gitlab-org/charts/gitlab-runner/blob/main/values.yaml#L633-L644
Official Helm Chart for the GitLab Runner (https://gitlab.com/gitlab-org/gitlab-runner)
Ahh great! I was doing something else, and this looks like the right answer. Once I have this set up, how would I verify that the Dagger CLI is talking to the enginer properly?
The k8s guide has a dagger command you can use.
Hmm the gitlab runner isn't happy with that mount. I have a feeling something isn't setup properly with my persistent volume.
You are missing the volumes section:
volumes:
- name: varrundagger
hostPath:
path: /var/run/dagger
You could use that one, you could also try calling a module. For example:
dagger call -m github.com/shykes/daggerverse/hello hello
The volume addition seemed to make the pod happy. I missed the command from the guide. My bad. I'll try that now.
So close, but so far it seems. Seems like it's hanging on starting the engine.
This is how I am running it.
The env variable is incorrect, I did not pick it up before. it's supposed to be:
env:
- name: _EXPERIMENTAL_DAGGER_RUNNER_HOST
value: unix:///var/run/buildkit/buildkitd.sock
Remember that if you configure on the runner pod it is not necessary to configure it on the job!
Wait, I'm messing something up myself
Give me a sec haha. reviewing again the history of messages, got a bit lost with another case I was looking into
Just checked. I think we are looking good. Quick review:
- Dagger helm chart does a host mount of
/var/run/buildkit - Runner pod lists the volume and mounts it on
/var/run/buildkit - Runner pod exposes env variable of
_EXPERIMENTAL_DAGGER_RUNNER_HOSTto point to the socket found on the host at/var/run/buildkit/buildkitd.sock
Try fixing the env variable to have the correct socket at /var/run/buildkit. Make sure that the runner pod mounts that volume as well /var/run/buildkit not /var/run/dagger
I'll try getting that setup going. It's a little confusing given the pod template for the engine. ha
My bad. I confused the host path and the path we mount it at the engine itself. The engine grabs the host path /var/run/dagger and mounts it into /var/run/buildkit in the engine container. Lets review your runner setup once more, how did you define the volumes and volumeMounts there?
This is how I have my mounts set up for my Gitlab Runner.
The actual CI job.
I am still getting hangs given this setup.
I also wonder if I am defining the volumes wrong given this page. https://archive.docs.dagger.io/0.9/488564/openshift-gitlab#step-2-configure-gitlab-runner In here they are defined in a different spot in the gitlab runner setup. But some of these docs seem old and Im not sure are still relevant
Introduction
I believe setting up the volume mounts like described in the doc are "correct" , but... still having issues getting everything connected
Okay. I made a mistake when I wrote the volumes definition above. In your gitlab runner, the volumeMounts is correct. The volumes is incorrect. We need to mount the host path /var/run/dagger into the cntainer path buildkit. We could put any path or names we want here, they don't have to be those, the important thing is that the components look in the right places.
THe volumes should be
volumes:
- name: varrundagger
hostPath:
path: /var/run/dagger
The volume mount you should leave it as is. And the environment variable should be pointing to unix:///var/run/buildkit/buildkitd.sock
The connections that happen here are:
- Dagger engine mounts it's local
/var/run/buildkitto the hosts/var/run/dagger - The gitlab runner mounts its local
/var/run/buildkitto the hosts/var/run/dagger - The gitlab runner exposes the
_EXPERIMENTAL_DAGGER_RUNNER_HOSTto connect to the unix socket that was mounted on/var/run/buildkitwhich is namedbuildkitd.sockand connects the gitlab runner to the dagger engine container
Basically all we have to do is connect the socket created by the dagger engine to the gitlab runner so that the CLI can talk directly to it
So I think I've weaved all that in for my runners. Whenever my gitlab runner pods spin up to grab these jobs, they end up exiting with code 255, and Im not quite sure yet but I'm sure it's related to mouting issues.
The 255 might be totally unrealted, but I'm seeing startin engine hang
Is it possible for you to deploy a gitlab runner pod and exec into it? So that we can debug if the socket is correctly mounted
Git lab runners are setup as a "Deployment" and each job gets it's own pod set up for the run. I can exec and get a shell on the container running the deployment, but the actual pods where the jobs run I can't /bin/bash exec into them
mmm I see. I was not aware of that architecture, I was assuming it worked like github runners. What if you get the Pod specification of the job that was launched? with that you should be able to spawn a pod and exec into it
Defaulted container "build" out of: build, helper, init-permissions (init)
cache empty lib local lock log mail opt run spool tmp
I can't get an actual shell for the running pod where the "starting engine" is stalled, but I can run commands on it.
Okay. Lets start first by checking the pod specification itself that was created. Can you confirm that the pod spec has the correct mounts?
Seems like it? Based on how I mounted the volumes on the runner
Looks like it's mounted properly?
Oh wait... this doc as has the runner running in the dagger namespace. https://archive.docs.dagger.io/0.9/488564/openshift-gitlab#step-2-configure-gitlab-runner
Introduction
I don't have it running in that namespace. I could try changing that
Namespace shouldn't be a problem. As long as the pod of the DaemonSet is running on the same node than the pod of the runner job it should work
Can you show the contents of the buildkit directory?
Workin on it
Looks like the dir is empty
The unix socket is definitely there on the dagger engine
Yeah, I don't know why the socket isn't getting mount there. Everywhere else I try to mount it, it seems to be working just fine. But inside the runner it has issues.
Ugh... I think I may have figured it out. I'll report back here in a few....
Yup.... I had to setup my runners to use affinity node selectors to run on the proper node. I thought I chekced this earlier. But this config seems to have done the trick.
@empty turret Thanks a ton for all the help with this today. Im sure I derailed your day a bit, but I appreicate the help