#Dagger Engine not working with GitLab CI/CD

1 messages · Page 1 of 1 (latest)

somber dawn
#

Hi! I am working on getting a dagger engine to work on our GitLab runner through a systemd unit and have run into a problem. The engine and the runner are working and are on the same network -- they can communicate with each other. When a job runs on GitLab, it echos out the correct Dagger Engine address and pings it successfully. However, once it runs the main.py file which has the dagger script, it stalls and eventually fails with 1 of 2 errors:

  1. 'dagger.SessionError: Failed to start Dagger engine session: Command '/root/.cache/dagger/dagger-0.8.1 session --label dagger.io/sdk.name:python --label dagger.io/sdk.version:0.8.1' returned non-zero exit status 1.'
  2. 'dagger.SessionError: Failed to start Dagger engine session: No connection params'

When trying to recreate this error on my local runner, I cannot. On my local runner, I am not using systemd, but that is the only difference. Any pointers or direction on how to fix this?

zenith imp
#

cc @worthy scaffold seems like the gitlab job is not being able to talk to the engine for some reason. IIRC you've been doing some thinking on how to make Dagger work in Gitlab self-hosted, correct?

#

@somber dawn are you setting the _EXPERIMENTAL_DAGGER_RUNNER_HOST variable to a specific value to make you dagger pipeline be able to talk to your self-hosted engine?

somber dawn
worthy scaffold
#

If not, I don't think 8000 is the right port. Can you try tcp://dagger-engine:1234

somber dawn
somber dawn
#

The error persists even with changing the port to 1234. To elaborate a little bit on what happens:

$ echo $_EXPERIMENTAL_DAGGER_RUNNER_HOST
tcp://dagger-engine:1234
$ ping -c 2 dagger-engine
PING dagger-engine (<ip>): 56 data bytes
64 bytes from <ip>: seq=0 ttl=64 time=0.595 ms
64 bytes from <ip>: seq=1 ttl=64 time=0.339 ms
--- dagger-engine ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.339/0.467/0.595 ms
$ python main.py
...

it hands on python main.py for a few minutes before the job fails and one of the two previous errors shows up

worthy scaffold
#

Are you sure that the engine spins up correctly? Do you have logs?

somber dawn
worthy scaffold
#

Can you share the docker-compose.yml?

somber dawn
#

The engine seems to be spinning up correctly. The systemd unit is running and when I check the engine logs during a job, it looks normal. The logs are mostly

time="2023-08-08T21:11:37Z" level=debug msg="engine metrics" cpu-count=12 cpu-idle=1657198671 cpu-iowait=92443 cpu-irq=0 cpu-nice=184 cpu-softirq=24923 cpu-steal=60918 cpu-system=987523 cpu-total=1660569840 cpu-user=2205178 disk-available-/=34396438528 disk-available-/var/lib/dagger=14310424576 disk-free-/=34396438528 disk-free-/var/lib/dagger=14310424576 disk-size-/=85856358400 disk-size-/var/lib/dagger=18238930944 goroutine-count=24 loadavg-1=0.15 loadavg-15=0.07 loadavg-5=0.09 mem-active=1844932608 mem-available=15278800896 mem-buffers=2158592 mem-cached=5100400640 mem-committed=1554841600 mem-free=10333360128 mem-inactive=3637399552 mem-mapped=252973056 mem-page-tables=7831552 mem-shmem=105074688 mem-slab=478867456 mem-swap-cached=0 mem-swap-free=2147479552 mem-swap-total=2147479552 mem-total=16654426112 mem-vmalloc-used=154996736 proc-self-mem-anonymous=17076224 proc-self-mem-private-clean=14282752 proc-self-mem-private-dirty=17076224 proc-self-mem-pss=31358976 proc-self-mem-referenced=31363072 proc-self-mem-rss=31363072 proc-self-mem-shared-clean=4096 proc-self-mem-shared-dirty=0 proc-self-mem-swap=0 proc-self-mem-swap-pss=0 uptime=384h38m0s
time="2023-08-08T21:11:41Z" level=trace msg="healthcheck completed" actualDuration=3.001959ms spanID=03056b14ae433d5d timeout=30s traceID=0035f1290099ab6102a3ef47874e7ad4
time="2023-08-08T21:11:46Z" level=trace msg="healthcheck completed" actualDuration=2.756521ms spanID=03056b14ae433d5d timeout=30s traceID=0035f1290099ab6102a3ef47874e7ad4
time="2023-08-08T21:11:51Z" level=trace msg="healthcheck completed" actualDuration=4.140553ms spanID=03056b14ae433d5d timeout=30s traceID=0035f1290099ab6102a3ef47874e7ad4
#

as I was looking through the file, i see that the version of dagger sdk i was using and the version in the docker-compose file are different -- i updated it and it did the trick! thank you @worthy scaffold @zenith imp !

worthy scaffold
#

Which versions were not working?

somber dawn
#

v0.6.4 wasn't working, but v0.8.1 is

zenith imp