#maintainers
1 messages Β· Page 20 of 1
Yes I agree, that's what Tibor was suggesting I think, use the engine version in dagger.json
Ah yes, exactly π
So, does this mean that the current state of main was generated with a main engine, which was required for CI-on-main to pass, and now everyone developing their branch with 0.20.8 is reverting that codegen every time we run 'dagger generate'?
So in other words, for my PR to pass CI, I have to install a main build of dagger
Very sneaky way to force everyone to dogfood, I like it
sorry, in the meantime:
curl -fsSL https://dl.dagger.io/dagger/install.sh | DAGGER_COMMIT=head BIN_DIR=... sh
Hm. we appear to be getting rate limited trying to obtain the token that we use to not get rate limited by dockerhub? if I'm parsing this correctly?
failed to fetch anonymous token: unexpected status from GET request to https://auth.docker.io/token?scope=repository%3Alibrary%2Falpine%3Apull&service=registry.docker.io: 429 Too Many Requests
``` ([from here](https://dagger.cloud/dagger/checks/github.com/dagger/dagger@47ffb4fcb7a777930118b6178e06e183688d8f30?check=typescript-sdk:test-nodejs-lts&viewMode=trace#1cfebb13d540ad54:L1814))
Or does that mean we aren't even using our token at all?
so when you are not authenticated to dockerhub at all, you still get a token, just an anonymous one? i thought you just wouldn't send a token at all. that's why I'm confused
(according to the robot, yes that's how it works)
yes
so yeah I guess our token is just missing somehow
approved first PR, but second one has a suspicious failure: https://dagger.cloud/dagger/checks/github.com/dagger/dagger@47ffb4fcb7a777930118b6178e06e183688d8f30?check=test-split:test-modules&listen=5781d34c0450706a&listen=cd3368d638728193&test=TestModule/TestFunctionCacheControl/go/never_cache&viewMode=tests
Ya, I didn't realize that the latest CI was still running when I sent that message, so I was cursed the instant I said it. I'm looking into it. Unclear if it's related to the PR or not yet, we'll see
hey; i want to contribute to Dagger, want to know if a Linux machine or VM is required for development, or macOS is sufficient for most of the work. What is the recommended local setup for contributors?
Thanks for offering to contribute! MacOS is sufficient as long as you have a Linux container runtime installed: Docker, Podman, or Apple's new container tool. The dagger CLI on your mac will auto-discover it and should work out of the box. Let us know if you get stuck.
I'm seeing this failure a lot (trace):
β― ~/src/dagger/better-tests/bin/dagger check
β load module: go 0.4s
β load module: midterm 0.4s
Error: loading modules: loading module "github.com/dagger/go@main": resolving module source "github.com/dagger/go@main": failed to load git dir: zjcrh039ljmw4is4s66bm0zv7: not found
When it happens it's always triggered by usage of github.com/dagger/go - I've observed this in dagger/dagger and now in https://github.com/vito/midterm (where I just adopted it and it's working great otherwise btw - living the drop-in toolchain dream)
I'm running my own branch's CLI + dev engine above so there's a chance it's my own fault, feel free to ignore until I have a repro on main, just raising in case someone else is hitting it too
EDIT: can repro easily on main against https://github.com/vito/midterm - see #1504168990279205055 message
I taught codex how to trigger re-runs in CI (using my auth token from my homedir), let it run overnight to try to shake flakes out, created this nice looking trace summary page https://dagger.cloud/dagger/checks/github.com/dagger/dagger@2be47918021e5b05a4ca422c8c09f36761a3b0e5
I never triggered the original flake I wanted, but did hit some other interesting ones (besides uninteresting 500 errors from external network deps). Very tempting to just let this run indefinitely in a loop and hand legit looking errors off to a human and/or agent 
i have this locally too (ah sorry, just saw the thread)
it's a bug triggered by cache pruning, so you could find some garbage to delete on your machine so you stay below the threshold
Ran into this very niche bug while moving SDK commands to a module: https://github.com/dagger/dagger/issues/13139
@still garnet does dang SDK give me direct access to loadFooFromID?
yep
@still garnet I think I may be hitting the limits of Changeset + Workspace...
https://github.com/shykes/dagger-go-sdk
Example:
$ cd ./my/workspace
$ dagger -m github.com/shykes/dagger-go-sdk call mod --path=./path/to/a/go/module generate
Apply changes?
β toolchains/engine-dev/dagger.gen.go +1136
β toolchains/engine-dev/go.mod +10 -10
β toolchains/engine-dev/go.sum +22 -20
β toolchains/engine-dev/internal/
β toolchains/engine-dev/internal/dagger/
β toolchains/engine-dev/internal/dagger/alpine.gen.go +322
β toolchains/engine-dev/internal/dagger/codegen.gen.go +152
β toolchains/engine-dev/internal/dagger/dagger-cli.gen.go +348
β toolchains/engine-dev/internal/dagger/dagger.gen.go +16008
β toolchains/engine-dev/internal/dagger/go.gen.go +768
β toolchains/engine-dev/internal/dagger/notify.gen.go +225
β toolchains/engine-dev/internal/dagger/version.gen.go +239
β toolchains/engine-dev/internal/dagger/wolfi.gen.go +183
β
β 13 files changed, +19413 -30 lines
β
β Apply Discard
β applying changes 0.0s ERROR
β Changeset.export(path: "."): String! 0.0s ERROR
! internal error: Directory.diff received different relative paths: "/before" != "/after"
working through a bunch of fix and improvement PRs currently getting them solid, but I think this one crossed the threshold now https://github.com/dagger/dagger/pull/13123
reduces allocations a containerd boltdb writes a ton by just getting rid of unnecessary lease creation
approved
@civic yacht Do you have any ideas for https://github.com/dagger/dagger/issues/13060 ? Asking because if we want to do a release we'll need some kind of solution even a stopgap.
still on my todo list, just towards the bottom since the other stuff popping up is less obscure
agree it's needed for the release, but yes worst case scenario if we just fallback to "private == locked" for now that'll suffice
I'll open a draft PR for that and if there are no other blockers we can decide
@still garnet in my module I execute a helper tool that prints out include filters to stdout. Completely internal, not meant for users to see. But with dagger check it is prominently shown to the user as "logs". Is there an otel attribute incantation to prevent that, without breaking other things?
Love a nice readable error message:
failed to resolve dep to source: failed to load sdk for dir module source: invalid SDK: "unknown" [traceparent:d905b195ff 456a765cca663a0618-8ad7cabfcbac0874]
π€
There's a 'verbose' attribute you can set on the log records themselvs, but the tricky part here is your tool is just writing to stdout, not really in a position to set log attributes 
if the bulk of the complaint is the [traceparent:...] part, lemme know where that's popping up and I'll strip it out if possible. The downside of propagating error origins through strings is having to play wack-a-mole in the UI, but at least it's super durable and there are finite places in the UI to clean it up. (It can still end up showing up places like test logs, but that's saved my ass sometimes when it shows up)
This is just venting - I just have no idea what's going on π
If it helps the tool already is a nested dagger client, so if needed it can emit custom spans no problem (not sure if it helps)
--> ah I see it has to be on the otel log entry itself
I guess I can change to writing it to a file
oh, in that case you can wrap it in a span that sets the Boundary attribute
(I figured out what was going on: my go-sdk module was trying to generate from an intentionally broken module (dagql/idtui/viztest/broken-dep-sdk). So I'm figuring out the best way to tell the SDK: "skip generating this one"
while poking around go tool trace of the engine I noticed we were blocked for like 300ms before each container invoking CNI, which wasn't expected since there's supposed to be pooled re-use of CNI netns'es. Turns out that got accidentally disabled in effect at some point due to a default hostname being set on each with exec.
Fixed that, TestModule takes 30% less time to run for me locally now: https://github.com/dagger/dagger/pull/13144
Holy!
That's on every single function invocation too then?
Yeah, it's like 300ish ms (some up to almost a second), so it might be noticeable in practice for a long chain of functions getting invoked
Youβre a hero
Hey, I made a PR to add CA certificate passthrough for NixOS: https://github.com/dagger/dagger/pull/13137. I stumbled on this and found a TODO in the codebase so I went ahead and tackled it. Feel free to ping me if you need anything or have questions.
Another obscure bug I hit while developing go-sdk... https://github.com/dagger/dagger/issues/13152
Another set of test telemetry tweaks for whoever's got time (they all do the same thing) π
FYI bunch of networking errors in CI right now from go module proxy, e.g. engine/client/drivers/container.go:17:2: github.com/google/go-containerregistry@v0.21.4: read "https://proxy.golang.org/github.com/google/go-containerregistry/@v/v0.21.4.zip": stream error: stream ID 1; INTERNAL_ERROR; received from peer
This has happened in the past too when proxy.golang.org was having issues, they don't have a status page that I know of unfortunately.
Blocker for moving SDKs to modules -> https://github.com/dagger/dagger/issues/13157
discord discussion about a user looking forward to implement a --fail-on-cache-miss flag to better understand if their pipelines are consistent cc @civic yacht
Maven seems to be mad at us today, lots of Java tests failing in CI randomly due to 429s from maven
π does anyone know why apk fails to work in our engine-dev playground container?
dagger call -m github.com/dagger/dagger engine-dev playground with-exec apk,add,openssh
@still garnet are self calls available in Dang? I guess not, but just wanted to be sure (and vote to add them π )
My use case: I'm changing the way SDKs are working, especially I'm removing the moduleTypes (going back to the previous version with entrypoint and empty function name for various reasons).
With that, codegen phase is in fact three things:
- analysis of the module source code (extract types and functions exposed): this requires the introspection json
- generation of the entrypoint (equivalent of the moduleTypes and the switch to invoke the right function): this requires the result from 1.
- generation of the dep bindings: this requires the introspection json, and 1. if we think about self calls
My main idea is to have all that under one single call from the engine perspective (because it simplifies things). But for caching purpose I'd like to be able to call the different phases as separated calls so they can be cached.
And the way to have that would be the use self calls so each function is cached individually instead of the single one. So better performances.
It's because we are using wolfi as the base. And even if we are providing the apk-tools package, there's no available package database or repository configuration.
I've added a new arg to install packages:
dagger call -m github.com/eunomie/dagger@playground-extra-packages engine-dev playground --extra-packages openssh terminal
Allow to install Wolfi packages in the playground
$ dagger call engine-dev playground --extra-packages openssh
I think i'd rather be able to run apk add inside the container
Yeah actually you can set playground --base to set the base image of your choice. The problem with an abstraction like extra-packages is that it breaks if you change the base image (but we could still support it, we just need to check for those 2 flags, and fail if they're both set)
I actually have the same problem in github.com/dagger/go : you can set a go version, or a custom base image, but not both
It's because we are using wolfi as the base
Really its because we're using our wolfi module as the base, though, and we could change that module. Wolfi itself (cgr.dev/chainguard/wolfi-base:latest) does have apk and package repos
we're using our wolfi package as the base
Aaaah. I thought it was some special wolfi thing I didn't know about
But what's special about our base?
I would have assumed it's just the regular base, since it's supposed to be a generic wolfi module?
afaik its the way we build the rootfs. Our wolfi module wraps our alpine module which uses our apko module to build the packages and add them to the fs
and its not actually the wolfi container, its just a container which we've added binaries to from the wolfi package registry
i guess you could argue what is a wolfi container, but in our case afaik it is not cgr.dev/chainguard/wolfi-base, but rather a scratch container with packages added
FYI we're hitting 429 too many requests to docker hub. Look at last commit from https://github.com/dagger/dagger/commits/main/
We do that because it saves over a GB of disk space across an engine build (vs just running βapk addβ on a base image) by not duplicating tons of data on disk.
We can add support for running apk in it. Thereβs a comment in the module code on how to do so. We just didnβt need it previously
I think we just need that (dropping in apk and the package repo configs) as an option for the playground base, not necessarily in the engine build
Note: I have an easy workaround already: dagger call -m github.com/dagger/dagger engine-dev playground --base=alpine terminal
I think I may even have added that --base argument specifically because of this problem, and forgotten π
My full command now:
dagger call -m github.com/dagger/dagger \
engine-dev \
playground --base=alpine \
with-unix-socket --path=/tmp/ssh.sock --source=$SSH_AUTH_SOCK \
with-env-variable --name=SSH_AUTH_SOCK --value=/tmp/ssh.sock \
terminal
Boom playground with ssh and ssh agent forwarding out of the box π Now I can actually develop and push code straight from the playground.
Now just add some sort of persistent workspace concept, support for it in dagger cloud, and we've got ourselves one of those fashionable sandbox hosting services π
Quick question: I am changing all the non-digest pulls to be pull by digests, at least in our tests. Should I change modules/alpine's to be pull by digest or that's not a good idea?
i think lockfile already solves this without changing module code, doesn't it?
Yes. In main you can already use that feature and delete all hardcoded digests from module code
(but not in 0.20.8 I believe)
Ah that would be sweet!
You need to run with dagger --lock=pinned in CI, to disable looking up tags that are already in the lockfile
and --lock=live locally I believe, in main lockfile is disabled by default
dagger call -m github.com/dagger/dagger \
engine-dev \
playground --base=alpine \
with-exec --args=apk,add,openssh \
with-unix-socket --path=/tmp/ssh.sock --source=$SSH_AUTH_SOCK \
with-env-variable --name=SSH_AUTH_SOCK --value=/tmp/ssh.sock \
with-exec --args=ssh,-T,-o,StrictHostKeyChecking=no,git@github.com \
stdout
Hi shykes! You've successfully authenticated, but GitHub does not provide shell access.
Is the golang:check failure known, related to Helm ?
In main? Not known to me. What's the error?
I looked at that briefly, it was something about not being able to re-use the same name for something. I suspect the problem is we were starting to actually get caching on it π It probably relied on never being cached accidentally. I bet there's just some state persisted in a cache mount that is resulting in us trying to re-use a name that already exists in that state
That error should go away for now again since I flipped the infra cache mount off again temporarily, but we can fix it once that's back on
@leaden glade I saw that we have code that does a cloud engine scale out for generators: https://github.com/dagger/dagger/blob/909a48fb2c58c8ac40c3699bad459af6205d36c8/core/modtree.go#L518
Which I'm confused by because scale out can not yet support transferring directories or anything besides scalars, so shouldn't work in theory. In practice when I just tried dagger generate --scale-out on my machine it error'd out with confusing errors.
You good to just delete it? I ran into this because I'm fixing a bug with how ModTree gets its cache serialized to disk, but then ran into a wall with tryRunGeneratorScaleOut since the code would essentially have needed to serialize state that is on a remote engine.
I think we also have some flakes on moduleRuntime Python
link?
confirming rn by rerunning wihtout the cache volume + clean state, but i have 2 PRs touching python and leading to the exact same failure, and cant repro locally: https://dagger.cloud/dagger/checks/github.com/dagger/dagger@cf18c4984897a843e88a9ca445aebab58fe7c190?check=test-split%3Atest-module-runtimes&run=b2c5c43d-da82-4ae8-915a-dcd332114fde
dagger
31 : β β β β β Container.withExec ERROR [0.6s]
dagger
31 : β β β β β [0.6s] | Resolved 39 packages in 114ms
dagger
31 : β β β β β [0.6s] | error: Distribution `yarl==1.24.0 @ registry+https://pypi.org/simple` can't be installed because it doesn't have a source distribution or wheel for the current platform
dagger
31 : β β β β β [0.6s] |
dagger
31 : β β β β β [0.6s] | hint: You're using CPython 3.11 (`cp311`), but `yarl` (v1.24.0) only has wheels with the following Python ABI tag: `cp310`
seems like an external dependency problem
googling around seems like similar things have happened to other packages when there was a bad publish to pypi
Maybe it was just a temporary thing, but a rerun after cache volume + clean state didnt trigger the issue, thansk for looking into it π
Could be another case then where it's an issue that has always existed but we didn't notice until now when we had actual cache re-use between CI runs
Anyone else seeing TestEnvFile/TestSecretFile flake? It's failed on two separate PRs, so feels unrelated:
- https://dagger.cloud/dagger/checks/github.com/dagger/dagger@b2511a1b34680152be9c1cc5fa4f72f4d022dde5?check=test-split:test-base&listen=61f66c1aaa3d9cc3&test=TestEnvFile/TestSecretFile&viewMode=tests
- https://dagger.cloud/dagger/checks/github.com/dagger/dagger@821604061fe3e60e02f3d1d978dda06b3f190f87?check=test-split:test-base&listen=654d434a3c93ebe1&test=TestEnvFile/TestSecretFile&viewMode=tests
Reran the second one and it passed
I'm trying to debug .python-version_takes_precedence and relaxed_.python-version
https://dagger.cloud/dagger/checks/github.com/dagger/dagger@15f6a4bdc932c275de0b03cd9608d9327ee90d03?check=test-split:test-module-runtimes&listen=aeec109729c3c6ad&test=TestPython/TestVersion/.python-version_takes_precedence&viewMode=tests
And at the same time i'm trying to get to do --lock=live via scaleout ... scaleout has a lot of issues (possibly related to toolchains/workspace)
Error: check "helm:assert-template" not found in module "helm"
(with --scale-out only)
related to this: #maintainers message no ? Reruns with no cache worked for me. Thanks for checking π
i dont think so. I saw that one too, but this is not hitting yarl afaict
Maybe not the same origin but same symptoms ?
Ah yes you're right... context switching is hard on me
Anyone else seeing `TestEnvFile/
π§΅ Checks as "rails" π§΅
we're hitting on workspace branch, which is quite annoying (as we have to modify some of the python tests), any luck ? Otherwise i'll dig a bit too Made a fix for workspace, not portable
dang self calls
Trying to see if this fixes it https://github.com/dagger/dagger/pull/13189
I'll approve now in case it does work out so you won't be blocked on merge (have to log off soon)
thanks for the fix π
Sure, I'll delete it.
Actually don't I'm about to fix --scale-out
Not sure if it's related or not, but i also see a bunch of errors with --scale-out, and i think it's just due to the prefix dance logic. For instance check helm:lint looks for a helm:lint check inside helm, instead of looking for a lint check inside helm. Possibly generators have the same issue
Itβs not possible yet to support generators since we canβt transfer directories engine-to-engine yet
I already have a pr with the deletion in it, so no worries if you didnβt get to this yet
Ah ok. Wasn't sure. Either way i need --scale-out to be fixed to test --lock=live
I did https://github.com/dagger/dagger/pull/13190 but one or the other will be fine π
Summary
The generator scale-out path returned a Changeset whose Directory fields reference objects that only exist on the remote engine, so the local engine cannot materialize the generated files ...
FYI im fixing the check generator in main
Ah i didnt see you fixed it already thanks yves
shouldnt the tryGeneratorAsCheckScaleOut also go ?
seems to be related to some iptables thing. I'd assume related to the recent kernel nft changes
<@&1506565370385793125> regression in workspace branch
<@&1506565370385793125> @tidal spire github.com/dagger/eslint maintenance π§΅
<@&1506565370385793125> @tidal spire github.com/dagger/vitest maintenance
Couple of fixes for some bugs when persisting/loading cache state between engine stop/start cycles: https://github.com/dagger/dagger/pull/13173
The low disk space problem we hit with cache volumes enabled in CI was because we were hitting those bugs, then the engine correctly fell back to "invalid state, throwing cache away" but then didn't delete the on-disk state for that cache being discarded, so it leaked. That's also fixed there too, it correctly deletes the state in that case.
any reason why TestTelemetry/TestGolden is not run in parallel? I see almost no CPU usage
k, seems to be related to the k3s image not working with nftables by default https://docs.k3s.io/known-issues#iptables. I'm checking if using the xtables-nft-multi binary in the k3s image works the same way we do it in th engine. The biggest caveat is that the official k3s image doesn't not have any package manger so I need to bundle those binaries in my k3s module. cc @civic yacht
https://github.com/dagger/dagger/pull/13194 seems to fix it ? Nvm it'd be just a stopgap
Re-pinging for review on this if anyone available, need it merged to be able to continue testing in native CI. Feel pretty good about it at this point, all the problems it fixes were repro'able in tests and pass after the fixes
checking. sorry too many issues π
@charred lotus definitely sus given that the last commit in main passes but when I try to run things locally in my machine I'm getting iptables issues. I'll dig bit better here
I think we cal close your PR
Debug cannot define methods on objects from outside this module\n\n" does not contain "Cannot query field \\\"echo\\\" on type \\\"Container\\\""
https://dagger.cloud/dagger/checks/github.com/dagger/dagger@99dbe1fde222bd6009606e9dd812b9e459c32d9b?check=test-split:test-module-runtimes&listen=c17d3bcab938680e&listen=fc8fc5025920677c&listen=29912704650fb9f3&listen=2998a7c4a75f2774#d501c1a6a9fc4059
PR cleanup @icy cliff
@leaden glade just got a hypothesis failure in CI on main: https://dagger.cloud/dagger/checks/github.com/dagger/dagger@9a3341d4b58120a1e5d61cda7aa3e5eb691efea3?check=python-sdk%3Apython-311%3Aunit&run=7c06266c-aa44-4c75-9e99-39d199d0f933
Guessing it's inconsistent because hypothesis is inherently non-deterministic right?
@icy cliff can you verify what the hypothesis framework uses as a rng seed ?
1.0 docs
whatβs the reason for having this indirection? https://github.com/search?q=repo%3Adagger/dagger SetModuleSourceSDKLoader&type=code, e.g. is there a case where we donβt want it set?
trying to understand some of the sdk code, and a lot of it has some fun/weird indirections like this, and iβm trying to figure out to what end all of this is for
Just a guess, but maybe simply to avoid circular dependencies.
In core/modulesource.go if we wanted to write moduleSourceSDKLoader = sdk.NewLoader().SDKForModule... the problem is the sdk package imports core. And in that case core would like to also import sdk
<@&946480760016207902> we're about to cut v0.21.0 (two PRs left). Any other issues/PRs we absolutely need to block the release on that you're aware of ?
Also, @civic yacht do you mind if we push https://github.com/dagger/dagger/pull/13138 from v0.21.0 to v0.21.1 ? It's a draft and a nice to have.
I would prefer to get some more testing today with cache volumes enabled in ci infra, something went wrong with the infra yesterday and I was not actually testing reuse of cache volumes #1506858288313143417 message
Sounds good!
<@&1506565370385793125> open a PR that implements the suggestion in https://github.com/dagger/dagger/issues/13174
1.0 dogfood π§΅
<@&1506565370385793125> Error executing template: array index out of bounds https://dagger.cloud/dagger/checks/github.com/dagger/dagger@34076821241410d8d7da8a0579325f5eccbbd403?check=golang:check&listen=cb1b3134b5c65959&listen=ccf20c8d376e377d&listen=b9b2bf56fbb293fe&listen=a0c7ab9fde311dfc#b9b2bf56fbb293fe
<@&1506565370385793125> prototype a change to the CLI progress renderers:
-
I'm assuming from memory that the default non -TUI renderer is --progress=main . if I'm misremembering just substitute the correct name in the rest of my message
-
Move the current "plain" to "classic". No longer the default
-
Change "plain" to be the most usable possible format for LLMs. take inspiration from all other renderers: logs, report, dots.
known issues today:
-
too much verbosity and internal details by default, "like stracing make". Especially for every day use ("which checks passed and how do I fix it?") as opposed to deep dive mode "what happened with this build, why was it so slow?" which doesn't belong in the default renderer
-
pipelines can run for a long time. some more concise renderers will not stream any progress for minutes- leaving llms wondering if they are hanging
-
missing contextual information that is available to humans in tui: how much time elapsed; cpu/memory/disk pressure; filesync is happening; (spitballing)
1.0 migration π§΅
Can I merge this PR https://github.com/dagger/dagger/pull/13215? The go test failed test seems not relate to it.