#Error while running dagger in Kubernetes and argo workflow

1 messages · Page 1 of 1 (latest)

zenith fable
#

I have set up dagger in K8s as per the documentation and set up a workflow with Argo Workflow

However I am having this error whenever I run the workflow, event with the example in the docs

! start session: poll for session: context deadline exceeded
✘ starting session 5m0.1s
! poll for session: context deadline exceeded
┃ Failed to connect; retrying... make request: Post "http://dagger/query": rpc error: code = Unknown desc = failed to get client metadata for session call: failed to JSON-unmarshal x-dagger-client-metadata: json: cannot unmarshal object into Go struct field ClientMetadata.labels of type []pipeline

gentle bronze
#

Hi @zenith fable what does your Argo workflow invocation of dagger cli look like?
Guessing you have an engine on each node in a daemonset?
Is there an env var set to point to the Dagger Engine pod?

(example below from our Helm instructions)

DAGGER_ENGINE_POD_NAME="$(kubectl get pod \
    --selector=name=dagger-dagger-helm-engine --namespace=dagger \
    --output=jsonpath='{.items[0].metadata.name}')"
export DAGGER_ENGINE_POD_NAME

_EXPERIMENTAL_DAGGER_RUNNER_HOST="kube-pod://$DAGGER_ENGINE_POD_NAME?namespace=dagger"
export _EXPERIMENTAL_DAGGER_RUNNER_HOST

https://docs.dagger.io/integrations/104820/kubernetes/

Deployment with Helm

zenith fable
#

yes I have deployed the deamon set with helm as per the docs, below is the part of the workflow with relevant info

  container:
    image: africlouds/dagger-cli:latest-amd64
    command: ["sh","-c"]
    #args: ["dagger -m github.com/kpenfound/dagger-modules/golang@v0.1.5 call build --project=. --args=."]
    args: [
     "dagger call build --src=.. --username={{workflow.parameters.registry_user}} --password={{workflow.parameters.registry_pass}} --server={{workflow.parameters.registry_server}}"
    ]
    #args: ["printenv"]
    workingDir: /work
    env:
    - name: "_EXPERIMENTAL_DAGGER_RUNNER_HOST"
      value: "unix:///var/run/dagger/buildkitd.sock"
#

runnung simple commands like dagger version works but the build fails event with the example in the docs (commented)

#

just tried this value: "kube-pod://dagger-dagger-helm-engine-7v47g?namespace=dagger"

#

as _EXPERIMENTAL_DAGGER_RUNNER_HOST

#

It seems I have passed the above error

gentle bronze
#

nice!

zenith fable
#

getting a different one though

Error: start engine: attach to telemetry: context deadline exceeded
time="2024-04-16T14:39:07.645Z" level=info msg="sub-process exited" argo=true error="<nil>"
Error: exit status 1

balmy void
obtuse crescent
#

hm that's a pretty generic error indicating it can't reach the engine (connecting to telemetry is the first thing it does, so you can see what's going on) - sorry, it would be more helpful if it surfaced the underlying error instead of just the retry logic timeout

balmy void
#

@zenith fable happy to jump to a quick #911305510882513037 session if you have the time to 👀 together 🙏

zenith fable
#

@balmy void sure I would appreciate to

balmy void
zenith fable
#

yes I am around @balmy void

balmy void
#
apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  containers:
  - name: test
    image: alpine
    command: ["sleep"]
    args: ["1d"]
    volumeMounts:
    - mountPath: /var/run/dagger
      name: example-volume
  volumes:
  - name: example-volume
    hostPath:
      path: /var/run/dagger
balmy void
balmy void
#

@zenith fable keep us posted about your progress 🙏

zenith fable
#

I am still stuck on trying to mount these 2 directory with the remote engine. The same pipeline runs well on a local engine @balmy void

#

yet it's still something to do with my pipeline since the hello world runs well using the argo workflow

balmy void
zenith fable
#

@balmy void I am online now

balmy void
zenith fable
#

yes sure

balmy void
#

@zenith fable can you remind me what k8s version you're using in your DO cluster?

#

was it 1.29?

zenith fable
#

yes

balmy void
#

ok, creating a cluster now to test .cc @graceful idol

zenith fable
#

👍

balmy void
#

@zenith fable I was able to repro!

#

using a k8s DO cluster with your same config

#

@zenith fable just verified that the issue should go away with a 2 vcpu core machine

#

can you try that out whenver you can and let me know if that works?

zenith fable
#

that's good, I will try and confirm tomorrow

zenith fable
#

yes it worked with a bigger node