#> so I started work on removing the
1 messages ยท Page 1 of 1 (latest)
I did notice once significant use of LLB left in core/schema/query.go for the schema introspection json file. I don't think it will be too hard to dag-op-ify though
after that there's probably some small loose ends scattered about but I think the schema file is last significant one?
oh..... that's right! I think I ran into it when I was tackling the container.File work, and it got pretty hairy so I left it as is, but I can give it a try next.
I think it was mostly the codegen portion of the build would break, and it made the dev cycles a bit more difficult, if I recall correctly.
Also, Dockerfile builds technically still use LLB, but don't worry about those.
I have an idea on how to isolate+bridge such that dockerfiles still go through buildkit but then the results are converted to what dagql understands. Only doable as part of the solver-removal work though, not beforehand.
I'm starting to feel a bit stuck on debugging the schemaJSONFile work. I'm trying to setup a dagop under https://github.com/dagger/dagger/compare/main...alexcb:dagger:dagop-schemaJSONFile?expand=1#diff-a2bab836c6f25d97c19d9ec4f5cbe987a91b5584bfa04e3488e7728d08be10f4R154
but when I try to do a build, I get errors during the codegen process:
Error: failed to serve module: failed to load dependencies as modules: failed to load module dependencies: failed to load dependencies as modules: failed to load module dependencies: failed to call module "codegen" to get functions: Error: "go" not found
It seems like the sdk runtime container that's being returned is wrong (since the go binary isn't there):
I'll keep hacking away at it, but was just curious if @onyx tangle had any ideas.
here's my relevant printfs
ACB eval of {ClientID:s31ezi1njh9zu349dplftsjzk SessionID: SecretToken: Hostname: ClientStableID: ExecID:vn0adwxstriet6vdr3sskxc79 Internal:true CallID:0xc002aa86c0 EncodedModuleID:ChV4eGgzOjk1M2EzY2U5NTU5NDYwYWMSnQEKFXh4aDM6MTNiNzgxM2I2YzI0NTU0YRKDARIQCgxNb2R1bGVTb3VyY2UYARoMbW9kdWxlU291cmNlIjMKCXJlZlN0cmluZxImOiQvaG9tZS9hbGV4L2RhZ2dlci9kYWdnZXIvY21kL2NvZGVnZW4iEwoNZGlzYWJsZUZpbmRVcBICGAFKFXh4aDM6MTNiNzgxM2I2YzI0NTU0YVgBEl8KFXh4aDM6OTUzYTNjZTk1NTk0NjBhYxJGChV4eGgzOmU4YWY3ZGNlNWJmNGRiOGQSCgoGTW9kdWxlGAEaCGFzTW9kdWxlShV4eGgzOjk1M2EzY2U5NTU5NDYwYWNYARJ2ChV4eGgzOmU4YWY3ZGNlNWJmNGRiOGQSXQoVeHhoMzoxM2I3ODEzYjZjMjQ1NTRhEhAKDE1vZHVsZVNvdXJjZRgBGgh3aXRoTmFtZSIRCgRuYW1lEgk6B2NvZGVnZW5KFXh4aDM6ZThhZjdkY2U1YmY0ZGI4ZA== EncodedFunctionCall:[123 34 78 97 109 101 34 58 34 34 44 34 80 97 114 ... CallerClientID: ParentIDs:map[] CacheMixin:xxh3:7bef3336d1891a0a HostAliases:map[] ExtraSearchDomains:[] RedirectStdinPath: RedirectStdoutPath: RedirectStderrPath: SecretEnvNames:[] SecretFilePaths:[] SystemEnvNames:[] EnabledGPUs:[] SSHAuthSocketPath: NoInit:false AllowedLLMModules:[] ClientVersionOverride:} failed exit code: 2 [traceparent:38c0ecd6fcd97e67fba1c479ec4ef72a-96a4f0a7a62b3685]
ACB load failed Error: "go" not found
having not read it yet, my immediate thought is the cache key might be wrong in that the cache key for the object passed around is not content hashed based on the schema. That would explain the behavior at least.
I suspect this API might be best following a different pattern than the usual dag-op. I don't think we need the whole lazy thing where the same API gets invoked two times, once outside dagop and once inside. It can actually be much simpler.
The whole point of this API is to be able to pass around the schema json as a File, but avoiding creating an ID that's passed between clients which contains the entire schema json string (which can be MBs).
Ideas:
- Don't use dag-op wrappers or anything. Just directly create a new MutableRef, write the schema.json to it, make it an ImmutableRef and then return a
Filewith that ref set. The only risk is whether our code correctly handlesRefs created outside of DagOp yet, not sure since I don't know if we've tried yet. - Update
Fileto support an additional field underLLBandRef:Contents []byte. IfContentsis set, then just pretend its a file with the given bytes. Would require updating all the relevant methods on File to handle that case in addition to the existing LLB+Ref cases, but probably not too bad
Those would be mutually exclusive options. But the end result of either of them would be that you can pass around references to the schema json file without having to actually pass the entire contents as an ID string from one client to another (i.e. during SDK module calls, the ID of the schema json file are won't actually have the full string in it).
If I had to pick, 2. might be the simplest?
Also, I'm just thinking out loud here, so take with salt
those are definitely helpful ideas! I'll keep on hacking at it -- I too was tempted to try creating a ref outside of the dagop, which might be possible especially since the schema should always be added to an empty directory.
I also added a panic incase we were passing around an empty digest, which never got hit, so I think that's a good thing ๐
yeah that's important! But also important that the digest of the arg being passed around changes when the schema changes. I guess you could check by println'ing schemaJSONFile.ID().Digest() for the id that's used here
So I'm running into the issue of s.srv.View getting lost when I get a callback in the dagop context, which I discovered due to a printf under: https://github.com/dagger/dagger/blob/3ee8da3aa724ffeca12aa7de8b73b33361f4f54a/core/schema/query.go#L274
I initially tried passing along the expected view as an arg, e.g.
WithArgument(call.NewArgument(
"expectedView",
call.NewLiteralString(s.srv.View.String()),
false,
))
then tried changing the Server.Schema() func to accept a specific View argument: https://github.com/dagger/dagger/blob/3ee8da3aa724ffeca12aa7de8b73b33361f4f54a/dagql/server.go#L450-L493
however I don't think that worked, since the tests kept on failing -- in particular some of the go client code isn't being generated correctly since the wrong schema is still being returned, which was ultimately giving me some ./main.go:15:13: dag.ContextGit undefined (type *dagger.Client has no field or method ContextGit) errors.
so I put in a terribly ugly hack, which I store the schema under a global map with a random ID, and pass along a reference to that ID instead, under: https://github.com/dagger/dagger/blob/3ee8da3aa724ffeca12aa7de8b73b33361f4f54a/core/schema/query.go#L330
But that's not an acceptable solution (memory leak, super hacky, etc).
So the ultimate question: How can I keep a reference the dagql.Server of the original call (or the equivalent original view) from within the dagop callback?
@onyx tangle I got all the tests passing for this now under https://github.com/dagger/dagger/pull/11733 however I'm still stuck figuring out how to get the correct view inside the dagop context.
sorry yes on my queue to look into soon, I will let you know!
ah thanks, no worries.
I have to run out at some point today to deal with a flat tire on the car ๐
but I can shift on to https://github.com/dagger/dagger/pull/11734 (completely unrelated) so I'm not blocked.
When you passed the view as an argument, did you also include the view in the digest of the operation? Right now with dagql IDs, if you override the digest then it doesn't matter if you change the args, so if any args change you gotta incorporate that into any custom digest
I actually may fix this soon by changing custom ID digests to work by just accepting a list of extra digest mixins, but doesn't exist yet
Oh I just read this, sorry to hear and no problem!
no rush at all
I found the friendliest tire shop, got it all sorted out and am back home. took all of 10-15mins at the shop.
I'm pretty sure I tried that... I had it under this commit: https://github.com/dagger/dagger/blob/3ee8da3aa724ffeca12aa7de8b73b33361f4f54a/core/schema/query.go#L344
newID := dagql.CurrentID(ctx).
WithArgument(call.NewArgument(
"expectedView",
call.NewLiteralString(uglyID),
//call.NewLiteralString(s.srv.View.String()),
false,
)).
WithDigest(hashutil.HashStrings(
// dgst.String(),
fmt.Sprintf("%v", rand.Float64()),
s.srv.View.String(), <-- I had this in here, even before I started tossing in random values :)
))
ctxDagOp := dagql.ContextWithID(ctx, newID)
I think part of the issue I was running into is when I'm inside the dagop, it always points to the same server that's configured under the worker: https://github.com/dagger/dagger/blob/3ee8da3aa724ffeca12aa7de8b73b33361f4f54a/engine/buildkit/worker.go#L113
What happens if you A) consistently use s.srv rather than CurrentDagqlServer(ctx) and B) Force a sync evaluation of the dagop (i.e. eval the LLB) inside the implementation?
Trying to plumb context through the buildkit solver in a way that actually works consistently is essentially impossible (which is why we still get random CI flakes about sessions missing), but if you do it all synchronously it might avoid the problem? Also shouldn't be a regression since we were implicitly evaling before this anyways I think
Side note: I've been seeing evaling spans in cloud traces: https://github.com/sipsma/dagger/blob/2408141c2463e4c8e0685cd40e457c5132fcc109/core/container.go#L1865-L1865, was that left in by accident?
hmmm, I think I put that in with the telemetry.Internal() flag to hide it.
It shows up still, for whatever reason
at least I've seen it
I can try removing it, and see if the idtui golden tests pass.
I was initially just using s.srv, but never tried the force sync. I'll give it a try.
oh yeah.... that totally shouldn't be there ๐
I managed to compare the contents of the schema, what I expect on the left (outside the dagop), and what I'm getting (inside the dagop).
somehow there's a test that's changing the schema? ๐
and this is produced using s.srv and an eval done under:
f, err := DagOpFile[*core.Query, schemaJSONArgs](ctxDagOp, s.srv, parent.Self(), args, nil, WithStaticPath[*core.Query, schemaJSONArgs](schemaJSONFilename))
if err != nil {
return inst, err
}
if _, err := f.Evaluate(ctx); err != nil { <--------------------------- here
return inst, err
}
dop, err := dagql.NewObjectResultForID[*core.File](f, s.srv, newID)
if err != nil {
return inst, err
}
return dop, nil
so given it's "essentially impossible", what are your thoughts in the way I have it implemented -- storing the contents in a global map, and simply passing a key (the hash of the content), via the dagql call?
If you think that's ok, I can clean it up a bit, and possibly add some sort of reference counting to delete the contents once the f.Evaluate(ctx) returns.
No we can't do that. It's too hacky and we're gonna regret it immediately.
Is your latest attempt pushed to your branch? I can take a look to see what's going wrong
one sec.... I'll push the version with the eval call
it contains a call to panic() whenever it detects a missmatch under: https://github.com/dagger/dagger/pull/11733/changes/9f6b81290247fd9b9fa61d3f5f4cb3409cc5fe3e#diff-a2bab836c6f25d97c19d9ec4f5cbe987a91b5584bfa04e3488e7728d08be10f4R162-R165
which I can then pull up a diff of the schema by running:
docker logs dagger-engine.dev |& grep 'ACB want' > /tmp/want
docker logs dagger-engine.dev |& grep 'ACB got' > /tmp/got
meld <(cat /tmp/got | awk '{print $3}' | base64 -d | jq .) <(cat /tmp/want | awk '{print $3}' | base64 -d | jq .)
we're gonna regret it immediately.
๐
Yeah to be clear I'm glad you tried it to verify this is all possible to get working ๐ I just know in my bones that it would become a headache quickly
(looking at what you pushed now)
I snuk in a RangeSorted function, so the fields should be produced in a stable order now,
What were you running to repro the bug? Just anything with a module?
./hack/with-dev go test -count=1 -v -run TestModule/TestContextDirectory ./core/integration/ hit it pretty quickly.
Found one possibility: https://github.com/sipsma/dagger/blob/21f887838c3897cee576f161b795eb027b6ac812/core/schema/wrapper.go#L135-L135
The CacheMap for the dagop is set to return that CacheKey, but it's just a mix of the self digest (which is Query in this case, so nothing) and args digest (which is also not helpful)
I'm profoundly looking forward to the day where we can get rid of those extra layer of cache keys and all we care about is the dagql cache key...
but ironically have to get over this hump first ๐
I'm checking if my theory is correct, just an fyi in case you're looking around still too
eh okay, that may be a factor but not the whole story, still digging
seeing if there's a way to fix the thing with the server being constant in the worker
okay that dagql server field on worker is actually an interface that gets you the "right" dagql server, problem is that it gets you the right one based on your client id.
that doesn't quite work in the way we call it though: https://github.com/sipsma/dagger/blob/21f887838c3897cee576f161b795eb027b6ac812/core/moddeps.go#L216-L216
I think i got something working. I pushed it to your branch as a separate commit.
It basically just passes the schema as an arg in the ID. I'm attempting to do it in such a way that the ID w/ the schema attached stays internal only and doesn't cross IPC/network boundaries
TestModule was passing for me, we'll see if everything else is okay with it
If it works I think that's probably good enough to hold us over. It should become much more trivial once the solver is out of the way
I see you're just passing the contents along over the args -- if that works and you're not worried about the memory overhead, then maybe that's fine?
The current code has a comment under https://github.com/sipsma/dagger/blob/21f887838c3897cee576f161b795eb027b6ac812/core/schema/query.go#L146-L151
// LLB marshalling takes up too much memory when file ops have a ton of contents, so we still go through
// the blob source for now simply to avoid that.
which made it seem like we wanted to avoid passing large data around, but perhaps that's only a LLB limitation and totally fine for dagops? or at least fine for now and we'll just accept the extra dagop serialization overhead?
Right that's why I mentioned above that this approach does it in such a way the schema attached stays internal only and doesn't cross IPC/network boundaries
ah, gotcha!