#codegen issue with custom engine in version > 0.11.1

1 messages · Page 1 of 1 (latest)

round verge
#

Hi,
with dagger version 0.11.1 everything works fine for us in our environment with podman and a custom dagger engine that is configured with:
podman run -d --name customized-dagger-engine --privileged -v dagger-engine:/var/lib/dagger registry.dagger.io/engine:v0.11.1
export _EXPERIMENTAL_DAGGER_RUNNER_HOST=podman-container://customized-dagger-engine

As soon as I update to 0.11.2 or also 0.11.4 the code generation stops working. Tested in an existing repo with dagger develop but to provide a simple repro case, this also occurs in a fresh directory:

  • dagger init - works, created dagger.json
  • dagger init --sdk=go - fails with the error below
  • dagger init --sdk=python - fails with a similar error (just tried for testing...)
  • Any call to a function with dagger call ... fails with the same error
    ✘ Container.directory(path: "/src"): Directory! 0.1s
    ! process "/usr/local/bin/codegen --output /src --module-context-path /src/dagger --module-name dagger-test --introspection-json-path /schema.json" did not complete successfully: exit code: 1
      ✘ exec /usr/local/bin/codegen --output /src --module-context-path /src/dagger --module-name dagger-test --introspection-json-path /schema.json 0.1s
      ! process "/usr/local/bin/codegen --output /src --module-context-path /src/dagger --module-name dagger-test --introspection-json-path /schema.json" did not complete successfully: exit code: 1

Error: failed to generate code: input: moduleSource.withContextDirectory.withName.withSDK.withSourceSubpath.asModule resolve: failed to create module: select: failed to update codegen and runtime: failed to generate code: failed to get modified source directory for go module sdk codegen: select: process "/usr/local/bin/codegen --output /src --module-context-path /src/dagger --module-name dagger-test --introspection-json-path /schema.json" did not complete successfully: exit code: 1
pine salmon
#

Hey! Thanks for the report!

Did you just updated the CLI or also the engine that runs?

round verge
#

This happens after updating the Engine, but I updated the CLI to the same version in all tests

pine salmon
#

Could you try to clean & prune your dagger engine before retrying again?
Maybe some stuff are mixed and create the issue.

#

I tried on my computed and the init works fine so I'm quite confused because the error is related to generation.
For example:

dagger init --name="a" --sdk=go

Is perfectly fine

round verge
#

Unfortuneatly, yes.

I have just tried on my computer (Debian 12 WSL) cleaning with
podman system prune --all and podman system reset and then executed dagger init --name="a" --sdk=go and I am getting the same error.

The same behaviour also exists on 2 RHEL8 servers with podman and a custom dagger engine. I've also already tried different go versions, including the newest 1.22.3 with no effect

royal minnow
#

I have the same issue if I have a go.work file in the root directory. If I don't and let dagger update the main go.mod file, everything works :/

pine salmon
#

Since it looks like it's more specific to go, maybe @open dock can give his opinion.

@tepid mango When you wrote the guide with Podman, did hit that kind of issues?

@pine temple Based on the Podman guide we wrote, you might need to set an extra configuration https://docs.dagger.io/integrations/528320/podman/

modprobe iptable_nat

Podman is CLI-compatible with Docker and therefore can be used by creating a symbolic link to the Podman executable in your system path and naming it docker:

#

I think this issue is more related to podman than the SDKs itselfs though, could you try with Docker just one time to see if it works?

round verge
#

On the RHEL8 servers modprobe iptable_nat is active (the engine does not work at all without it)

The problem does not even seems to be limited to init or code updates, but it also affects function calls like from the docs dagger -m github.com/shykes/daggerverse/hello@v0.1.2 call hello

I was able to verify (on my personal computer) that everything works fine with Docker and also with containerd and nerdctl

pine salmon
#

Okay so it seems like a Podman issue :/
@worthy jacinth May I delegate this issue to you then? I'm not an expert of Podman :/

worthy jacinth
#

@round verge / @royal minnow when you have a chance could you share the output of the dagger engine container logs after hitting that error? I think it's just podman logs <name of container> but don't have podman installed anywhere to confirm.

#

The error message being returned to the client in this case is not especially descriptive, so just want to check if there's anymore clues in there.

#

Another user recently hit some strange cgroup issues when using podman, but that was only when using custom CA certs installed in /usr/local/share/ca-certificates of the engine container, which doesn't appear to be something either of you are doing based on the commands

royal minnow
#

In my case the error appears on Docker For Mac so my issue might be unrelated but the error message is the same as far as I can tell. If I run init in debug mode, I get the following message in addition to the error above: Error: load package "dagger": no packages found in /src/dagger

Is this related or should I open a separate thread?

round verge
#

These are the logs afterr executing dagger init --name="a" --sdk=go

round verge
pine salmon
pine salmon
royal minnow
round verge
# pine salmon Is the container ran in `--privileged` mode? Dagger engine needs to be run in `-...

yes, these are the commands that I used to verify:

export _EXPERIMENTAL_DAGGER_RUNNER_HOST=podman-container://customized-dagger-engine
podman run -d --name customized-dagger-engine --privileged -v dagger-engine:/var/lib/dagger registry.dagger.io/engine:v0.11.1
dagger init --name="a" --sdk=go
  --> works as expected
podman stop customized-dagger-engine
podman rm customized-dagger-engine
podman run -d --name customized-dagger-engine --privileged -v dagger-engine-0115:/var/lib/dagger registry.dagger.io/engine:v0.11.5
dagger init --name="a" --sdk=go
  --> Error: failed to generate code: input: moduleSource.withContextDirectory.withName.withSDK.withSourceSubpath.asModule resolve: failed to create module: select: failed to update codegen and runtime: failed to generate code: failed to get modified source directory for go module sdk codegen: select: process "/usr/local/bin/codegen --output /src --module-context-path /src/dagger --module-name a --introspection-json-path /schema.json" did not complete successfully: exit code: 1
      Stderr:
      runc run failed: unable to start container process: unable to apply cgroup configuration: mkdir /sys/fs/cgroup/cpuset/buildkit: permission denied
pine salmon
round verge
pine salmon
#

Maybe it would make more sense to to change the engine version too

#

But hang on, I think you said it started to broke after upgrading to 0.11.2 in your first message?

#

As soon as I update to 0.11.2 or also 0.11.4 the code generation stops working. Tested in an existing repo with dagger develop but to provide a simple repro case, this also occurs in a fresh directory:

round verge
#

The engine stops working with 0.11.2, just the more specific error message appeared with 0.11.5

#

The error messages are also equal in both versions, just with the added line in 0.11.5

pine salmon
#

Because it seems that 0.11.2 broke it, and there's not much commit that may cause this issue except that one maybe

neon kestrel
#

definitely not that

#

there's potentially some path funkiness changes in this version

#

out of curiosity @round verge, what's the $PWD of the directory you're running dagger init in?

#

long shot but worth checking

neon kestrel
#

we removed the shim in 0.11.5

#

out of curiosity, if you do a simple dagger query against an engine, using the default example on https://play.dagger.cloud/, does that work? it could be that all dagger withExecs are just broken

#

you're not overriding the container entrypoint? it's at /usr/local/bin/dagger-entrypoint.sh, and does some initial cgroup setup to prevent pretty much this exact issue

round verge
# neon kestrel out of curiosity, if you do a simple `dagger query` against an engine, using the...

this is the output of the example query (with client and engine 0.11.5):

1   : connect
2   :   connecting to engine
10:08:42 INF Connected to engine name=85a8a637203e version=v0.11.5
2   :   connecting to engine DONE [0.3s]
3   :   starting session
3   :   starting session DONE [0.2s]
1   : connect DONE [0.5s]

4   : Container.from(address: "alpine"): Container!
4   : Container.from: Container! DONE [1.6s]

5   : Container.stdout: String!
6   :   exec apk add curl
6   :   [0.1s] | runc run failed: unable to start container process: unable to apply cgroup configuration: mkdir /sys/fs/cgroup/cpuset/buildkit: permission denied
7   :   remotes.docker.resolver.HTTPRequest
7   :   remotes.docker.resolver.HTTPRequest DONE [0.1s]
8   :   remotes.docker.resolver.HTTPRequest
8   :   remotes.docker.resolver.HTTPRequest DONE [0.1s]
6   :   exec apk add curl ERROR [0.3s]
6   :   ! process "apk add curl" did not complete successfully: exit code: 1
5   : Container.stdout: String! ERROR [0.8s]
5   : ! process "apk add curl" did not complete successfully: exit code: 1
Error: make request: input: container.from.withExec.withExec.stdout resolve: process "apk add curl" did not complete successfully: exit code: 1
round verge
#

actually I (accidentially) tested without a custom engine and the behaviour is the same

neon kestrel
#

hmm okay

#

does /sys/fs/cgroup/cgroup.subtree_control exist on your host system? vs in the context of the dagger podman container?

#

if it does, you have cgroupsv2, there's potentially some accidental cgroups changes at some point

round verge
#

no, neither exists on the hosts nor on the dagger podman container

neon kestrel
#

okay, so it looks like potentially this is a cgroupsv1 issue, that's good to know 😄