#Engine-as-cli

1 messages · Page 1 of 1 (latest)

next relic
#

Starting a thread because I'd really like to get this working. I've just switched from some old ci-base image I used (which I no longer have the Dockerfile for but I think was FROM docker:cli + installing Dagger) to the engine-as-cli and hitting errors. Debugging the instances now to ensure it's not those first

#

Getting a lot of failed to GC client DBs on the dagger host but I presume they don't matter

#

This is how I'm starting the engine, can you sense check @covert agate :

docker run \
  -v /var/lib/dagger \
  -p ${dagger_engine_port}:${dagger_engine_port} \
  --rm \
  --privileged \
  --name dagger-engine \
  registry.dagger.io/engine:v0.12.7 \
  --addr tcp://0.0.0.0:${dagger_engine_port} \
  --addr unix:///run/buildkit/buildkitd.sock
#

Swapped back to the old base image and it connects to the remote engine fine, so it's definitely the switch to engine-as-cli.

I'm going to try and recreate that old Dockerfile, I think it was just docker:cli and Dagger

covert agate
#

Off the top of my head, yes? That should work I think

next relic
#

I'm fairly sure it does, the engine is running and the runner jobs can connect to it

#

There's something not working with engine-as-cli for my setup, so I'm getting the base image updated and working as a baseline then I'll debug further

#

Yep, that works

#
FROM docker:cli

USER root

RUN apk add curl

RUN curl -fsSL https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/local/bin sh

☝️ Works as a job base image

image: ghcr.io/dagger/engine:9823a005d5c1c131345c3278bd6ef197c65d1c8b # 0.12.7

☝️ Doesn't work as a job base image

#

Am I misunderstanding, or should that work?

covert agate
#

cc @supple abyss

supple abyss
#

Hey @next relic !

My understanding is that you trying to use our Engine image for the CLI & it doesn't work when connecting to a remote Engine.

What are you setting the _EXPERIMENTAL_DAGGER_RUNNER_HOST environment variable to in the engine-as-cli container?

next relic
#

_EXPERIMENTAL_DAGGER_RUNNER_HOST: tcp://10.0.0.251:12346

#

(It can reach that IP/port)

supple abyss
#

Did you confirm that variable via env & connection via nc before running the cli?

next relic
#

No, but when I use my base image with no other changes the pipeline runs, shows the engine ID, and I've checked that engine Id on the dagger instance and it's correct

#
1   : connect
2   :   connecting to engine
2   :   | engine name=00357ec13459 version=v0.12.7 client=l727fisxuv3helpkux2fefppm
ONTAINER ID   IMAGE                               COMMAND                  CREATED             STATUS             PORTS                                           NAMES
00357ec13459   registry.dagger.io/engine:v0.12.7   "dagger-entrypoint.s…"   About an hour ago   Up About an hour   0.0.0.0:12346->12346/tcp, :::12346->12346/tcp   dagger-engine
#

Top = runner logs, bottom = dagger instance host

supple abyss
#

OK, got it. Is this Docker? Can you share a few steps for me to reproduce?

next relic
#

There's no Docker in the Gitlab job at all, nor any docker engine available to it

supple abyss
#

OK, so Engine is running in Docker, as provisioned by Gitlab.

You are running another container in the same Docker instance, but using it as a CLI to connect to the Engine container.

Is that correct?

#

I will try to reproduce using the following steps:

  1. Start Engine container using your docker command - let's call this dagger-engine-0-12-7
  2. Start another Engine container with sh - let's call this dagger-cli-0-12-7
  3. Ensure that nc -vzw 1 dagger-engine-0-12-7 1234 works in the dagger-cli-0-12-7 container
  4. Ensure that _EXPERIMENTAL_DAGGER_RUNNER_HOST=dagger-engine-0-12-7 dagger core version works in the dagger-cli-0-12-7 container

Do they look right to you?

#

Above steps worked as expected. Attaching step-1.sh & step-2.sh which you can use to reproduce in your Docker environment.

manic warren
#

@next relic is there any chance you could share a snippet of your gitlab pipeline defintions?

#

that'd probably give us a bit more context to help you out 🙌

next relic
#

Have just got back, will check this shortly. And yep - can share some snippets, there's really not much to it

#

Ok, info dump

#

Pipeline:

stages:
  - scan

variables:
  _EXPERIMENTAL_DAGGER_RUNNER_HOST: tcp://10.0.0.251:12346
  # DAGGER_MODULE: github.com/mjb141/daggerverse/kics@main # You can also set module as a variable

scan-dir:
  image: mikebrown008/ci-cli-base:0.5
  stage: scan
  tags:
    - runnerdemo
  script:
    - dagger -m "github.com/mjb141/daggerverse/kics@main" call scan --dir .

Instance 1: Dagger engine - started with https://discordapp.com/channels/707636530424053791/1280546777291952152/1280555357542027324

Instance 2: Gitlab runner

Networking is verified working correctly, Dagger listens on 12346, security group allows it, Gitlab can reach Dagger on 12346 (host is set in gitlab pipeline)

Module works locally, works remotely with my base image. Switching my base image (which is just https://discordapp.com/channels/707636530424053791/1280546777291952152/1280561039934492743) to the dagger engine results in a lot of what look like engine logs:

#
Executing "step_script" stage of the job script 04:35
Using docker image sha256:4b144fde936571a7b3786fd63e4f134f97c9a3a1b09917f35065c0056591b0a1 for ghcr.io/dagger/engine:9823a005d5c1c131345c3278bd6ef197c65d1c8b with digest ghcr.io/dagger/engine@sha256:1f73f772dd1ba5631917527b9b8ec8db557abe6930e005a7de342aa8f33f456f ...
#!/bin/sh
set -e
cat $0
# cgroup v2: enable nesting
# see https://github.com/moby/moby/blob/38805f20f9bcc5e87869d6c79d432b166e1c88b4/hack/dind#L28
if [ -f /sys/fs/cgroup/cgroup.controllers ]; then
    # move the processes from the root group to the /init group,
    # otherwise writing subtree_control fails with EBUSY.
    # An error during moving non-existent process (i.e., "cat") is ignored.
    mkdir -p /sys/fs/cgroup/init
    xargs -rn1 < /sys/fs/cgroup/cgroup.procs > /sys/fs/cgroup/init/cgroup.procs || :
    # enable controllers
    sed -e 's/ / +/g' -e 's/^/+/' < /sys/fs/cgroup/cgroup.controllers \
> /sys/fs/cgroup/cgroup.subtree_control
fi
# expect more open files due to per-client SQLite databases
# many systems default to 1024 which is far too low
ulimit -n 1048576
exec /usr/local/bin/dagger-engine --config /etc/dagger/engine.toml "$@"
time="2024-09-03T15:30:12Z" level=info msg="detected mtu 1500 via interface eth0"
time="2024-09-03T15:30:12Z" level=debug msg="engine name: runner-wpb7afjjg-project-48059828-concurrent-0"
time="2024-09-03T15:30:12Z" level=debug msg="creating engine GRPC server"
time="2024-09-03T15:30:12Z" level=debug msg="creating engine lockfile"
dnsmasq[35]: started, version 2.90 cachesize 150
dnsmasq[35]: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset auth no-cryptohash no-DNSSEC loop-detect inotify dumpfile

(and a lot more)

#

If I leave that running it just spins logs like this

time="2024-09-03T15:34:12Z" level=debug msg="engine metrics" cpu-count=2 cpu-idle=215568 cpu-iowait=1553 cpu-irq=0 cpu-nice=349 cpu-softirq=101 cpu-steal=728 cpu-system=2087 cpu-total=226322 cpu-user=5936 dagger-session-count=0 disk-available-/=4845293568 disk-available-/var/lib/dagger=4845293568 disk-free-/=4862070784 disk-free-/var/lib/dagger=4862070784 disk-size-/=8132173824 disk-size-/var/lib/dagger=8132173824 goroutine-count=11 loadavg-1=0.01 loadavg-15=0.05 loadavg-5=0.06 mem-active=408961024 mem-available=3437682688 mem-buffers=30724096 mem-cached=2028695552 mem-committed=1091330048 mem-free=1560211456 mem-inactive=1776697344 mem-mapped=282435584 mem-page-tables=3502080 mem-shmem=999424 mem-slab=150265856 mem-swap-cached=0 mem-swap-free=0 mem-swap-total=0 mem-total=4013543424 mem-vmalloc-used=14491648 proc-self-mem-anonymous=9252864 proc-self-mem-private-clean=37634048 proc-self-mem-private-dirty=9252864 proc-self-mem-pss=46886912 proc-self-mem-referenced=46891008 proc-self-mem-rss=46891008 proc-self-mem-shared-clean=4096 proc-self-mem-shared-dirty=0 proc-self-mem-swap=0 proc-self-mem-swap-pss=0 uptime=18m53s

Until I kill it, so... it's trying to start an engine?

#

Oh hang on, you've changed the entrypoint on the container... I presume the default behaviour is to start an engine, so I need to change that too

#

That's done it. Always something simple, thanks for that @supple abyss / @manic warren !

supple abyss
#

Yes, that was it! Glad that we could help @next relic 💪