#Gitlab CI with Kubernetes Executor minimal example?

1 messages · Page 1 of 1 (latest)

sly trout
#

So, I'm trying to evaluate Dagger in my company, and I'm struggling with getting a minimal gitlab job that runs as it does locally. I have seen the advanced Kubernetes setup, but getting my infra team to set stuff up like that would be easier if I could show how it works on a small scale. We use a Self-Hosted Gitlab instance, with the runners running via the Kubernetes executor on a GKE cluster. My issue seems to be that our pods are not privileged, so I need to somehow get the dagger engine connected to something that can be spun up within the Gitlab job. I know it would cause issues with caching, but that's for later. I've tried a bunch of things, no luck so far. First is the example from the docs, which fails, presumably due to the fact that the k8s executor we use is not privileged. Here are the errors:

1: starting engine 
failed to list containers: exit status 1
1: starting engine [0.67s]
1: connect ERROR: new client: failed to run container: Failed to initialize: unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory
: exit status 1
new client: failed to run container: Failed to initialize: unable to resolve docker endpoint: open /certs/client/ca.pem: no such file or directory
: exit status 1
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: command terminated with exit code 1

I then tried to use the rootless buildkit image, but similar issues came up: https://dille.name/blog/2020/06/01/using-buildkit-for-cloud-native-builds-in-gitlab/
Is there an example dagger gitlab ci config that works without root with the k8s executor? maybe like a dedicated dagger/dagger:engine image that does this stuff for you?
Thanks in advance!

grizzled vault
#

hey @sly trout! sorry we missed this. Were you able to get unblocked?

sly trout
#

Still working on it, I can't test gitlab ci connections to the dagger engine without the engine installed, I should hopefully be getting it in soon

grizzled vault
sly trout
#

@grizzled vault so no, it does not. I just had it deployed to the cluster, and got the following:

dagger run go run ci/main.go
1: connect
1: > in init
1: starting engine 
failed to list containers: exit status 1
1: starting engine [0.78s]
1: connect ERROR: new client: failed to run container: docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
See 'docker run --help'.
: exit status 125
new client: failed to run container: docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
See 'docker run --help'.
: exit status 125
#

here is my job in gitlab:

dagger:
  stage: experiment
  variables:
    CI: "false"

  image: golang:alpine
  before_script:
    - apk add docker-cli curl
    - cd /usr/local && { curl -L https://dl.dagger.io/dagger/install.sh | sh; cd -; }
  script:
    - dagger run go run ci/main.go
  needs: []
  rules:
    - allow_failure: true
#

that error above is from the gitlab pipeline job itself

#

and it looks like all the pods for dagger ar running:

NAME                                                 READY   STATUS    RESTARTS   AGE
cloud-gitlab-runner-6557885b6f-5gbwc                 1/1     Running   0          15d
cloud-gitlab-runner-6557885b6f-5rpvr                 1/1     Running   0          35d
cloud-gitlab-runner-6557885b6f-6w7t2                 1/1     Running   0          15d
cloud-gitlab-runner-6557885b6f-7d4sg                 1/1     Running   0          15d
cloud-gitlab-runner-6557885b6f-8s94c                 1/1     Running   0          15d
cloud-gitlab-runner-6557885b6f-94vgm                 1/1     Running   0          35d
cloud-gitlab-runner-6557885b6f-ntbz8                 1/1     Running   0          35d
cloud-gitlab-runner-6557885b6f-nxgt7                 1/1     Running   0          15d
cloud-gitlab-runner-6557885b6f-qpxjb                 1/1     Running   0          15d
cloud-gitlab-runner-6557885b6f-rpc2j                 1/1     Running   0          15d
dagger-dagger-helm-engine-2gb4m                      1/1     Running   0          2m49s
dagger-dagger-helm-engine-84pll                      1/1     Running   0          2m49s
dagger-dagger-helm-engine-crt6p                      1/1     Running   0          2m49s
dagger-dagger-helm-engine-grmtd                      1/1     Running   0          2m49s
dagger-dagger-helm-engine-nffsg                      1/1     Running   0          2m49s
runner-u1g9wmvxm-project-355-concurrent-2-ul2gbpud   2/3     Error     0          7d11hv
sly trout
#

Am I missing some sort of DAGGER_ENGINE_HOST=dagger-helm-engine.cloud-runners.svc.cluster.local?

grizzled vault
grizzled vault
grizzled vault
sly trout
#

Ah, what should I set it to?

sly trout
#

Ah, so I tried it with _EXPERIMENTAL_DAGGER_RUNNER_HOST: "kube-pod://dagger-dagger-helm-engine-2gb4m?container=dagger-engine"

#

But it failed with a context timeout:

$ dagger run go run ci/main.go
1: connect
1: > in init
1: starting engine 
1: starting engine [600.0s]
1: connect ERROR: new client: context deadline exceeded
new client: context deadline exceeded
Cleaning up project directory and file based variables
#

I think it might have to do with the fact that I picked a random one of the pods, is there a way to pick the one that runs on the same node?

#

I guess I don't understand how to reference the correct host in k8s

grizzled vault
grizzled vault
#

do you have a dedicated cluster for this? or you only have a set of nodes dedicated for the gitlab runners?

#

_EXPERIMENTAL_DAGGER_RUNNER_HOST should be set to unix:///var/run/buildkit/buildkitd.sock

#

the volume mounts are also important

sly trout
#

Sorry about the delay, are those the volume mounts for the gitlab runners, or the dagger helm chart?

grizzled vault
vale plaza
grizzled vault
grizzled vault
vale plaza
grizzled vault
#

nope, I get redirected to an empty page. I guess you can try by openining the link in incognito mode

vale plaza
#

mh... the publish function seems to be broken

vale plaza
glass echo
#

Thanks, I will review (most likely near the end of this week or early next week)

sly trout
#

I've made sure both the engine and runners are running privileged, restarted both deployments, added the volume mounts, and I'm still unable to make this work. here are the k8s logs for the runner pod:

helper {"script": "/scripts-355-3685196/prepare_script"}
helper Running on runner-u1g9wmvxm-project-355-concurrent-0-npq7khg1 via cloud-gitlab-runner-54497b67f5-sttsb...
helper
helper {"command_exit_code": 0, "script": "/scripts-355-3685196/prepare_script"}
helper {"script": "/scripts-355-3685196/get_sources"}
helper Fetching changes with git depth set to 20...
helper Initialized empty Git repository in /builds/cloud/docs/.git/
helper Created fresh repository.
helper Checking out a2bb1e4f as detached HEAD (ref is main)...
helper
helper Skipping Git submodules setup
helper
helper {"command_exit_code": 0, "script": "/scripts-355-3685196/get_sources"}
build {"script": "/scripts-355-3685196/step_script"}
build $ dagger run go run ci/main.go
build 1: connect
build 1: > in init
build 1: starting engine
build 1: starting engine [600.0s]
build 1: connect ERROR: new client: context deadline exceeded
build new client: context deadline exceeded
build
build {"command_exit_code": 1, "script": "/scripts-355-3685196/step_script"}
Stream closed EOF for cloud-runners/runner-u1g9wmvxm-project-355-concurrent-0-npq7khg1 (init-permissions)
#

I exec'd into the container, and the appropriate /var/run/dagger is not there, could explain it, still looking it

grizzled vault
#

@sly trout what's usually your timezone? I can offer to jump into a quick #911305510882513037 if that works

tired otter
#

Thinking @edgy hull might also have insight

sly trout
#

Got it working!

#

Lot's of weirdness, I'll do a writeup here of what needs to be done

#

happy to jump on a call some time as well, I'm in MST

grizzled vault
#

🙌

#

looking forward to the writeup

edgy hull
# tired otter Thinking <@993792038711599164> might also have insight

yes in fact we are doing 2 different things (based on the projects):

  • either docker-in-docker in the gitlab runners hosted in kubernetes - using https://github.com/nestybox/sysbox/ as our container runtime for the kube cluster. this works well for small projects (using buildkit's remote caching feature with gitlab's registry. not perfect, but better than nothing)
  • or using a dedicated docker daemon hosted outside of the cluster - so we can benefit from local caching
GitHub

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs. - GitHub - nestybox/sysbox: An open-s...

sly trout
#

So, turns out the big issue was with setting the correct volume mounts on the pods that the gitlab runners create to run jobs, not on the runners themselves. The config for that is set in the config.toml file and the config looks something like this:

    [[runners]]
      [runners.kubernetes]
        namespace = "{{.Release.Namespace}}"
        image = "ubuntu:22.04"
        [[runners.kubernetes.volumes.host_path]]
          name = "dagger-socket"
          mount_path = "/var/run/dagger"
          read_only = false
#

Depending on how you install the runners, if you use helm, the config can be set from the values.yaml as part of the install

#

docker-in-docker doesn't work for gke clusters due to the default privilege setup, and I think it makes sense.