#Engine-as-cli
1 messages · Page 1 of 1 (latest)
Starting a thread because I'd really like to get this working. I've just switched from some old ci-base image I used (which I no longer have the Dockerfile for but I think was FROM docker:cli + installing Dagger) to the engine-as-cli and hitting errors. Debugging the instances now to ensure it's not those first
Getting a lot of failed to GC client DBs on the dagger host but I presume they don't matter
This is how I'm starting the engine, can you sense check @covert agate :
docker run \
-v /var/lib/dagger \
-p ${dagger_engine_port}:${dagger_engine_port} \
--rm \
--privileged \
--name dagger-engine \
registry.dagger.io/engine:v0.12.7 \
--addr tcp://0.0.0.0:${dagger_engine_port} \
--addr unix:///run/buildkit/buildkitd.sock
Swapped back to the old base image and it connects to the remote engine fine, so it's definitely the switch to engine-as-cli.
I'm going to try and recreate that old Dockerfile, I think it was just docker:cli and Dagger
Off the top of my head, yes? That should work I think
I'm fairly sure it does, the engine is running and the runner jobs can connect to it
There's something not working with engine-as-cli for my setup, so I'm getting the base image updated and working as a baseline then I'll debug further
Yep, that works
FROM docker:cli
USER root
RUN apk add curl
RUN curl -fsSL https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/local/bin sh
☝️ Works as a job base image
image: ghcr.io/dagger/engine:9823a005d5c1c131345c3278bd6ef197c65d1c8b # 0.12.7
☝️ Doesn't work as a job base image
Am I misunderstanding, or should that work?
cc @supple abyss
Hey @next relic !
My understanding is that you trying to use our Engine image for the CLI & it doesn't work when connecting to a remote Engine.
What are you setting the _EXPERIMENTAL_DAGGER_RUNNER_HOST environment variable to in the engine-as-cli container?
Did you confirm that variable via env & connection via nc before running the cli?
No, but when I use my base image with no other changes the pipeline runs, shows the engine ID, and I've checked that engine Id on the dagger instance and it's correct
1 : connect
2 : connecting to engine
2 : | engine name=00357ec13459 version=v0.12.7 client=l727fisxuv3helpkux2fefppm
ONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
00357ec13459 registry.dagger.io/engine:v0.12.7 "dagger-entrypoint.s…" About an hour ago Up About an hour 0.0.0.0:12346->12346/tcp, :::12346->12346/tcp dagger-engine
Top = runner logs, bottom = dagger instance host
OK, got it. Is this Docker? Can you share a few steps for me to reproduce?
Docker is provisioning the engine yes: https://discordapp.com/channels/707636530424053791/1280546777291952152/1280555357542027324
There's no Docker in the Gitlab job at all, nor any docker engine available to it
OK, so Engine is running in Docker, as provisioned by Gitlab.
You are running another container in the same Docker instance, but using it as a CLI to connect to the Engine container.
Is that correct?
I will try to reproduce using the following steps:
- Start Engine container using your
dockercommand - let's call thisdagger-engine-0-12-7 - Start another Engine container with
sh- let's call thisdagger-cli-0-12-7 - Ensure that
nc -vzw 1 dagger-engine-0-12-7 1234works in thedagger-cli-0-12-7container - Ensure that
_EXPERIMENTAL_DAGGER_RUNNER_HOST=dagger-engine-0-12-7 dagger core versionworks in thedagger-cli-0-12-7container
Do they look right to you?
Above steps worked as expected. Attaching step-1.sh & step-2.sh which you can use to reproduce in your Docker environment.
@next relic is there any chance you could share a snippet of your gitlab pipeline defintions?
that'd probably give us a bit more context to help you out 🙌
Have just got back, will check this shortly. And yep - can share some snippets, there's really not much to it
Ok, info dump
Pipeline:
stages:
- scan
variables:
_EXPERIMENTAL_DAGGER_RUNNER_HOST: tcp://10.0.0.251:12346
# DAGGER_MODULE: github.com/mjb141/daggerverse/kics@main # You can also set module as a variable
scan-dir:
image: mikebrown008/ci-cli-base:0.5
stage: scan
tags:
- runnerdemo
script:
- dagger -m "github.com/mjb141/daggerverse/kics@main" call scan --dir .
Instance 1: Dagger engine - started with https://discordapp.com/channels/707636530424053791/1280546777291952152/1280555357542027324
Instance 2: Gitlab runner
Networking is verified working correctly, Dagger listens on 12346, security group allows it, Gitlab can reach Dagger on 12346 (host is set in gitlab pipeline)
Module works locally, works remotely with my base image. Switching my base image (which is just https://discordapp.com/channels/707636530424053791/1280546777291952152/1280561039934492743) to the dagger engine results in a lot of what look like engine logs:
Executing "step_script" stage of the job script 04:35
Using docker image sha256:4b144fde936571a7b3786fd63e4f134f97c9a3a1b09917f35065c0056591b0a1 for ghcr.io/dagger/engine:9823a005d5c1c131345c3278bd6ef197c65d1c8b with digest ghcr.io/dagger/engine@sha256:1f73f772dd1ba5631917527b9b8ec8db557abe6930e005a7de342aa8f33f456f ...
#!/bin/sh
set -e
cat $0
# cgroup v2: enable nesting
# see https://github.com/moby/moby/blob/38805f20f9bcc5e87869d6c79d432b166e1c88b4/hack/dind#L28
if [ -f /sys/fs/cgroup/cgroup.controllers ]; then
# move the processes from the root group to the /init group,
# otherwise writing subtree_control fails with EBUSY.
# An error during moving non-existent process (i.e., "cat") is ignored.
mkdir -p /sys/fs/cgroup/init
xargs -rn1 < /sys/fs/cgroup/cgroup.procs > /sys/fs/cgroup/init/cgroup.procs || :
# enable controllers
sed -e 's/ / +/g' -e 's/^/+/' < /sys/fs/cgroup/cgroup.controllers \
> /sys/fs/cgroup/cgroup.subtree_control
fi
# expect more open files due to per-client SQLite databases
# many systems default to 1024 which is far too low
ulimit -n 1048576
exec /usr/local/bin/dagger-engine --config /etc/dagger/engine.toml "$@"
time="2024-09-03T15:30:12Z" level=info msg="detected mtu 1500 via interface eth0"
time="2024-09-03T15:30:12Z" level=debug msg="engine name: runner-wpb7afjjg-project-48059828-concurrent-0"
time="2024-09-03T15:30:12Z" level=debug msg="creating engine GRPC server"
time="2024-09-03T15:30:12Z" level=debug msg="creating engine lockfile"
dnsmasq[35]: started, version 2.90 cachesize 150
dnsmasq[35]: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset auth no-cryptohash no-DNSSEC loop-detect inotify dumpfile
(and a lot more)
If I leave that running it just spins logs like this
time="2024-09-03T15:34:12Z" level=debug msg="engine metrics" cpu-count=2 cpu-idle=215568 cpu-iowait=1553 cpu-irq=0 cpu-nice=349 cpu-softirq=101 cpu-steal=728 cpu-system=2087 cpu-total=226322 cpu-user=5936 dagger-session-count=0 disk-available-/=4845293568 disk-available-/var/lib/dagger=4845293568 disk-free-/=4862070784 disk-free-/var/lib/dagger=4862070784 disk-size-/=8132173824 disk-size-/var/lib/dagger=8132173824 goroutine-count=11 loadavg-1=0.01 loadavg-15=0.05 loadavg-5=0.06 mem-active=408961024 mem-available=3437682688 mem-buffers=30724096 mem-cached=2028695552 mem-committed=1091330048 mem-free=1560211456 mem-inactive=1776697344 mem-mapped=282435584 mem-page-tables=3502080 mem-shmem=999424 mem-slab=150265856 mem-swap-cached=0 mem-swap-free=0 mem-swap-total=0 mem-total=4013543424 mem-vmalloc-used=14491648 proc-self-mem-anonymous=9252864 proc-self-mem-private-clean=37634048 proc-self-mem-private-dirty=9252864 proc-self-mem-pss=46886912 proc-self-mem-referenced=46891008 proc-self-mem-rss=46891008 proc-self-mem-shared-clean=4096 proc-self-mem-shared-dirty=0 proc-self-mem-swap=0 proc-self-mem-swap-pss=0 uptime=18m53s
Until I kill it, so... it's trying to start an engine?
Oh hang on, you've changed the entrypoint on the container... I presume the default behaviour is to start an engine, so I need to change that too
That's done it. Always something simple, thanks for that @supple abyss / @manic warren !
Yes, that was it! Glad that we could help @next relic 💪