#Salut @solomon !

1 messages · Page 1 of 1 (latest)

cedar depot
#

When run locally the logs are slightly different

┃ registry.gitlab.com/<path>/syncer2:syncer-v0.1.1@sha256:4bb984f5f8c4a6aa9806294cd674957ad9b2571923d4cfd338385f6bb1d16c4c
│ │ ✘ remotes.docker.resolver.HTTPRequest 1.0s
│ │ │ ✘ HTTP HEAD 1.0s
│ │ ✘ remotes.docker.resolver.HTTPRequest 0.3s
│ │ │ ✘ HTTP HEAD 0.3s
│ │ ✘ remotes.docker.resolver.HTTPRequest 1.0s
│ │ │ ✘ HTTP HEAD 1.0s
│ │ ✘ remotes.docker.resolver.HTTPRequest 1.0s
│ │ │ ✘ HTTP HEAD 1.0s
│ │ ✘ remotes.docker.resolver.HTTPRequest 1.0s
│ │ │ ✘ HTTP HEAD 1.0s
│ │ ✘ remotes.docker.resolver.HTTPRequest 0.1s
│ │ │ ✘ HTTP HEAD 0.1s
│ │ ✘ remotes.docker.resolver.HTTPRequest 0.1s
│ │ │ ✘ HTTP HEAD 0.1s
│ │ ✘ remotes.docker.resolver.HTTPRequest 0.1s
│ │ │ ✘ HTTP HEAD 0.1s
│ │ ✘ remotes.docker.resolver.HTTPRequest 0.1s
│ │ │ ✘ HTTP HEAD 0.1s
│ │ ✘ remotes.docker.resolver.HTTPRequest 0.2s
│ │ │ ✘ HTTP HEAD 0.2s
│ │ ✘ remotes.docker.resolver.HTTPRequest 0.1s
│ │ │ ✘ HTTP HEAD 0.1s
│ │ ✘ remotes.docker.resolver.HTTPRequest 0.1s
│ │ │ ✘ HTTP HEAD 0.1s
│ │ ✘ remotes.docker.resolver.HTTPRequest 0.2s
│ │ │ ✘ HTTP HEAD 0.2s
│ │ ✘ remotes.docker.resolver.HTTPRequest 0.1s
│ │ │ ✘ HTTP HEAD 0.1s
│ │ ✘ remotes.docker.resolver.HTTPRequest 0.2s
│ │ │ ✘ HTTP HEAD 0.2s

I kinda remember to have read somewhere that the HEAD showing up as error is "normal", related to some wrong log level, but at the end of the day the whole POST or PUT is done successfully.
Also, while I am running dagger with the same verbose mode (-v) , there are no other requests' type shown lit it is when running via the K8s executor ( GET, POST etc ..)

wintry root
#

I was thinking of a dagger cloud url, it's a web view of the whole trace where you can drill down in more detail

cedar depot
#

Oh .. ok . I haven't been using the WebUI;
Let me create an account then and rerun that thing.

#

thx!

cedar depot
#

All good. I can see the traces in Dagger Cloud . Not sure how I can share those though? 🤔

wintry root
# cedar depot All good. I can see the traces in Dagger Cloud . Not sure how I can share those ...

At the moment there are only 2 sharing options:

  1. You can make all traces public for a given repository (I think @pearl cobalt ?) - probably not what you want
  2. You can share a trace URL with a Dagger team member, if they have support/admin privileges we can look at it to help debug --> this is what you want

TLDR: you can safely just share a trace URL here with no special setting, and by default only Dagger team & your own org members can see it

cedar depot
wintry root
cedar depot
#

just did 🙂

wintry root
#

Sorry I only see one trace URL, it seems to be from a CI run and seems to fail

cedar depot
#

This one is the local one succeeding ☝️

wintry root
#

Mmm, quick question, on your local machine does your regular docker config have permissions to push that that image?

#

(without using the token)

cedar depot
#

For what is worth,
The crossplane function running the same way in the k8s environment manages to push to gitlab using the same token as well.

wintry root
cedar depot
#

testing now.

#

It worked : i could publish.

wintry root
#

Ah ok. Then my theory is wrong 😦

#

I was observing that you don't give the same image address to withRegistryAuth and publish

#

So was thinking maybe the token doesn't actually get used - and it only works locally because you're logged in

cedar depot
wintry root
#
  • withRegistryAuth: .../platform/crossplane/syncer2
  • publish: .../platform/crossplane/syncer2-debug:syncer-v0.1.1
cedar depot
#

yes very true.

#

I could change it to see ?

wintry root
#

Worth a shot

#

Another possible root cause, of course, could be that env://GL_TOKEN just doesn't get the right token in the CI environment. But I assumed you already checked that.

cedar depot
#

does not explain why this is working locally though ..

cedar depot
#

Also it worked without changing only the CI part to use the docker executor.

#

Ok .. so just tied locally with the same URL for both withRegistry and publish .. and it still works.
Tesing now via CI

#

OK . running again 'cause it failed for another reason.

#

😦 same thing I am afraid ...

#

There is one difference between the local and ci trace: the METHOD for a successful local run is "PUT" while the failed CI run is "POST"

#

🤔 If I try to run locally but changing the gitlab URL for something I know is wrong. I would faild with the exact same trace and messages.

#

(and I don't know what to do with this 🙂 yet )

pearl cobalt
#

@cedar depot I'd run docker logoutlocally to reproduce the auth issue

cedar depot
#

@pearl cobalt I tried logged out and it works locally

#

I am logged out @pearl cobalt

pearl cobalt
cedar depot
#

very much empty yes.

#
{
        "auths": {}
}⏎
#

Also the same code & token works using the docker executor via CI

#

So it feels to me that this is not really an auth issue.

#

Whatever the message keeps telling me 🙂

pearl cobalt
#

looks like an auth issue

#

I undertsood locally it was still working? Sorry, I'm a bit confused

cedar depot
#

No the latest test was me changing the URL to see what kind of message I would get. And it turns out this is the "not sufficient permissions" kinda message I got too.

#

just running it now with the proper URL and being logged out

#

and it works.

pearl cobalt
#

ok, what happens if you pass an incorrect token? Does it fail?

cedar depot
#

yes

#

running it now with a bad token locally

pearl cobalt
#

ok, I saw that. Have you validate that in CI the GL_TOKEN is correctly set?

cedar depot
#

Oh yes

pearl cobalt
#

can you share a snippet of your .gitlabci.yaml file?

cedar depot
#

sure

#

thx

pearl cobalt
#

feel free to DM if you can't make it public

cedar depot
#

I have actually debug it all the way to the code and displayed the plaintext() version of the secret.

#

So I am pretty sure that the value is the same I am using locally.

pearl cobalt
cedar depot
#

Well yes I did a manually push

#

after a docker login using this very token.

#

So yeah, I feel as solid I can be on the token side .. with such an obvious error message, I had to 1000x check this.

wintry root
#

(sorry @cedar depot I am in a meeting... will come back to this after)

pearl cobalt
#

had a quick call with Seb and something strange is definitely happening. Will try to repro using the same settings he's using in gitlab 🙏

#

we weren't able to make it work

cedar depot
#

Thanks for your help both. Really appreciated 🤗

wintry root
#

OK - sorry about that @cedar depot and thank you for your patience

pearl cobalt
pearl cobalt
#

ok, I was also able to make it work with the local k8s executor here
let's continue checking tomorrow Seb

#

Seb, one thing that I'd like to try if you can tomorrow is to recycle your dagger engine nodes just to make sure there's no stale caching problem that might be happening here 🙏

cedar depot
#

Thanks a lot @pearl cobalt for trying to reproduce the issue.
And I think you got it right. It was a cache issue. (I think yesterday we did not clean up the cache really on second thought)
This morning all I did is:

  • Go on each dagger engine pod
  • Delete everything under /var/lib/dagger
  • Restart the daemonset
  • Run he gitlabCI with 0 changes

-> 🎉 The job succeeded

#

Only one of the pod had some cache 8GB. I will dig further to see whether it could have been some disk space issue (Running TalOS as the underlying OS).

#

I was wondering 🤔 Is there a way to clean up the cache in a k8s context other than startingn a pod with a dagger cli an running the prune command? (on each node) ?

#

Again thanlks a lot for the time spent investigating. Let me know if you need me to test things any further.

wintry root
#

Amazing! Glad you were able to solve it. Good thinking @pearl cobalt 🙂

wintry root
# cedar depot I was wondering 🤔 Is there a way to clean up the cache in a k8s context other ...

I think that's probably the best way available... We are working on making Dagger work better in a cluster. Right now we are concentrating our efforts on two major blockers:

  1. Finally decoupling cache storage from compute in the engine architecture

  2. Stabilizing the interfaces for remote engines.

Remove those two blockers paves the way to a stateless engine and cluster-aware engine, and from there a serverless auto-scaling engine... the sky is the limit 🙂

cedar depot
#

Using he HostPath does feel a litlle odd for the dagger-engine to me. Also if we could not run runners and engine with elevated priviledge that would be awesome. In TalOS, the default pod policy enforced does not allow this, hence a little more tweak is needed to make dagger-engine works in that environment.
Looking forward for the coming changes!

wintry root
#

Once we have a stateless engine with stable interfaces for remote access, we won't need the crutch of daemonset + hostpath 👍

Escalated privileges: this is a framing issue. Dagger is not a containerized app: it's a container runtime and orchestrator. It's only packaged as an OCI image for convenience, but architecturally it's a host service. You can use docker and kubernetes to provision it, but fundamentally they operate at the same layer. And in the future might even replace them for specialized workloads (although that is not our goal)

From a security standpoint: approach securing dagger the way you approach securing docker or kubernetes. Mostly that means treating it like a host service and making the host machine the security boundary.

cedar depot
#

🤔 something is definitively weird here. I had just realized I left some debug stuff we did yesterday with @pearl cobalt .
I removed them, run the CI again and got this error (unrelated to the part that was previously failing)

https://v3.dagger.cloud/clovrlabs/traces/62cd3f1abd2851c122ef7478fcbeaa7f?span=aca2bf1b4522bb0d

It is complaining that "semantic-release" is not available in PATH while on the previous with_exec it just ran.
I will upgrade to the latest dagger version to see.

cedar depot
#

FYI : I upgraded to 0.18.5 and so far all is fine. I did try to use the latest version but runners would fail with some dependencies error.

pearl cobalt
pearl cobalt
#

also just to make sure, are you committing the python generated sdk folder into your repo? That might cause some issues as well

near needle
#

looking at the code, i think i would expect this to happen when a v0.18.5 cli calls a v0.18.6 engine

cedar depot
near needle
cedar depot
#

thx @near needle , bumping the version to 0.18.6 at the engine level made the trick. Might be good to add it somewhere in the documentation. When running locally the upgrade is done for you automatically; so it was not obvious (at least for me 🙂 )

wintry root
#

this is one of the reasons the interfaces are not yet stabilized... cli/engine versioning matrix hell

pearl cobalt
cedar depot
silent depot