#> so I started work on removing the

1 messages ยท Page 1 of 1 (latest)

onyx tangle
#

I did notice once significant use of LLB left in core/schema/query.go for the schema introspection json file. I don't think it will be too hard to dag-op-ify though

#

after that there's probably some small loose ends scattered about but I think the schema file is last significant one?

dull portal
#

oh..... that's right! I think I ran into it when I was tackling the container.File work, and it got pretty hairy so I left it as is, but I can give it a try next.

#

I think it was mostly the codegen portion of the build would break, and it made the dev cycles a bit more difficult, if I recall correctly.

onyx tangle
#

Also, Dockerfile builds technically still use LLB, but don't worry about those.

I have an idea on how to isolate+bridge such that dockerfiles still go through buildkit but then the results are converted to what dagql understands. Only doable as part of the solver-removal work though, not beforehand.

dull portal
#

I'm starting to feel a bit stuck on debugging the schemaJSONFile work. I'm trying to setup a dagop under https://github.com/dagger/dagger/compare/main...alexcb:dagger:dagop-schemaJSONFile?expand=1#diff-a2bab836c6f25d97c19d9ec4f5cbe987a91b5584bfa04e3488e7728d08be10f4R154

but when I try to do a build, I get errors during the codegen process:

Error: failed to serve module: failed to load dependencies as modules: failed to load module dependencies: failed to load dependencies as modules: failed to load module dependencies: failed to call module "codegen" to get functions: Error: "go" not found

It seems like the sdk runtime container that's being returned is wrong (since the go binary isn't there):

I'll keep hacking away at it, but was just curious if @onyx tangle had any ideas.

GitHub

Automation engine to build, test and ship any codebase. Runs locally, in CI, or directly in the cloud - Comparing dagger:main...alexcb:dagop-schemaJSONFile ยท dagger/dagger

#

here's my relevant printfs

ACB eval of {ClientID:s31ezi1njh9zu349dplftsjzk SessionID: SecretToken: Hostname: ClientStableID: ExecID:vn0adwxstriet6vdr3sskxc79 Internal:true CallID:0xc002aa86c0 EncodedModuleID:ChV4eGgzOjk1M2EzY2U5NTU5NDYwYWMSnQEKFXh4aDM6MTNiNzgxM2I2YzI0NTU0YRKDARIQCgxNb2R1bGVTb3VyY2UYARoMbW9kdWxlU291cmNlIjMKCXJlZlN0cmluZxImOiQvaG9tZS9hbGV4L2RhZ2dlci9kYWdnZXIvY21kL2NvZGVnZW4iEwoNZGlzYWJsZUZpbmRVcBICGAFKFXh4aDM6MTNiNzgxM2I2YzI0NTU0YVgBEl8KFXh4aDM6OTUzYTNjZTk1NTk0NjBhYxJGChV4eGgzOmU4YWY3ZGNlNWJmNGRiOGQSCgoGTW9kdWxlGAEaCGFzTW9kdWxlShV4eGgzOjk1M2EzY2U5NTU5NDYwYWNYARJ2ChV4eGgzOmU4YWY3ZGNlNWJmNGRiOGQSXQoVeHhoMzoxM2I3ODEzYjZjMjQ1NTRhEhAKDE1vZHVsZVNvdXJjZRgBGgh3aXRoTmFtZSIRCgRuYW1lEgk6B2NvZGVnZW5KFXh4aDM6ZThhZjdkY2U1YmY0ZGI4ZA== EncodedFunctionCall:[123 34 78 97 109 101 34 58 34 34 44 34 80 97 114  ... CallerClientID: ParentIDs:map[] CacheMixin:xxh3:7bef3336d1891a0a HostAliases:map[] ExtraSearchDomains:[] RedirectStdinPath: RedirectStdoutPath: RedirectStderrPath: SecretEnvNames:[] SecretFilePaths:[] SystemEnvNames:[] EnabledGPUs:[] SSHAuthSocketPath: NoInit:false AllowedLLMModules:[] ClientVersionOverride:} failed exit code: 2 [traceparent:38c0ecd6fcd97e67fba1c479ec4ef72a-96a4f0a7a62b3685]
ACB load failed Error: "go" not found
onyx tangle
# dull portal I'm starting to feel a bit stuck on debugging the `schemaJSONFile` work. I'm try...

having not read it yet, my immediate thought is the cache key might be wrong in that the cache key for the object passed around is not content hashed based on the schema. That would explain the behavior at least.

I suspect this API might be best following a different pattern than the usual dag-op. I don't think we need the whole lazy thing where the same API gets invoked two times, once outside dagop and once inside. It can actually be much simpler.

The whole point of this API is to be able to pass around the schema json as a File, but avoiding creating an ID that's passed between clients which contains the entire schema json string (which can be MBs).

Ideas:

  1. Don't use dag-op wrappers or anything. Just directly create a new MutableRef, write the schema.json to it, make it an ImmutableRef and then return a File with that ref set. The only risk is whether our code correctly handles Refs created outside of DagOp yet, not sure since I don't know if we've tried yet.
  2. Update File to support an additional field under LLB and Ref: Contents []byte. If Contents is set, then just pretend its a file with the given bytes. Would require updating all the relevant methods on File to handle that case in addition to the existing LLB+Ref cases, but probably not too bad

Those would be mutually exclusive options. But the end result of either of them would be that you can pass around references to the schema json file without having to actually pass the entire contents as an ID string from one client to another (i.e. during SDK module calls, the ID of the schema json file are won't actually have the full string in it).

#

If I had to pick, 2. might be the simplest?

#

Also, I'm just thinking out loud here, so take with salt

dull portal
#

those are definitely helpful ideas! I'll keep on hacking at it -- I too was tempted to try creating a ref outside of the dagop, which might be possible especially since the schema should always be added to an empty directory.

#

I also added a panic incase we were passing around an empty digest, which never got hit, so I think that's a good thing ๐Ÿ˜…

onyx tangle
#

yeah that's important! But also important that the digest of the arg being passed around changes when the schema changes. I guess you could check by println'ing schemaJSONFile.ID().Digest() for the id that's used here

dull portal
#

So I'm running into the issue of s.srv.View getting lost when I get a callback in the dagop context, which I discovered due to a printf under: https://github.com/dagger/dagger/blob/3ee8da3aa724ffeca12aa7de8b73b33361f4f54a/core/schema/query.go#L274

I initially tried passing along the expected view as an arg, e.g.

        WithArgument(call.NewArgument(
            "expectedView",
            call.NewLiteralString(s.srv.View.String()),
            false,
        ))

then tried changing the Server.Schema() func to accept a specific View argument: https://github.com/dagger/dagger/blob/3ee8da3aa724ffeca12aa7de8b73b33361f4f54a/dagql/server.go#L450-L493

however I don't think that worked, since the tests kept on failing -- in particular some of the go client code isn't being generated correctly since the wrong schema is still being returned, which was ultimately giving me some ./main.go:15:13: dag.ContextGit undefined (type *dagger.Client has no field or method ContextGit) errors.

so I put in a terribly ugly hack, which I store the schema under a global map with a random ID, and pass along a reference to that ID instead, under: https://github.com/dagger/dagger/blob/3ee8da3aa724ffeca12aa7de8b73b33361f4f54a/core/schema/query.go#L330

But that's not an acceptable solution (memory leak, super hacky, etc).

So the ultimate question: How can I keep a reference the dagql.Server of the original call (or the equivalent original view) from within the dagop callback?

dull portal
onyx tangle
dull portal
#

ah thanks, no worries.

#

I have to run out at some point today to deal with a flat tire on the car ๐Ÿ™

onyx tangle
#

I actually may fix this soon by changing custom ID digests to work by just accepting a list of extra digest mixins, but doesn't exist yet

onyx tangle
#

no rush at all

dull portal
dull portal
#
    newID := dagql.CurrentID(ctx).
        WithArgument(call.NewArgument(
            "expectedView",
            call.NewLiteralString(uglyID),
            //call.NewLiteralString(s.srv.View.String()),
            false,
        )).
        WithDigest(hashutil.HashStrings(
            // dgst.String(),
            fmt.Sprintf("%v", rand.Float64()),
            s.srv.View.String(),       <-- I had this in here, even before I started tossing in random values :)
        ))
    ctxDagOp := dagql.ContextWithID(ctx, newID)
onyx tangle
#

Trying to plumb context through the buildkit solver in a way that actually works consistently is essentially impossible (which is why we still get random CI flakes about sessions missing), but if you do it all synchronously it might avoid the problem? Also shouldn't be a regression since we were implicitly evaling before this anyways I think

dull portal
onyx tangle
#

at least I've seen it

dull portal
#

I can try removing it, and see if the idtui golden tests pass.

dull portal
dull portal
#

oh yeah.... that totally shouldn't be there ๐Ÿ™

dull portal
#

I managed to compare the contents of the schema, what I expect on the left (outside the dagop), and what I'm getting (inside the dagop).

#

somehow there's a test that's changing the schema? ๐Ÿ˜•

#

and this is produced using s.srv and an eval done under:

    f, err := DagOpFile[*core.Query, schemaJSONArgs](ctxDagOp, s.srv, parent.Self(), args, nil, WithStaticPath[*core.Query, schemaJSONArgs](schemaJSONFilename))
    if err != nil {
        return inst, err
    }

    if _, err := f.Evaluate(ctx); err != nil {        <--------------------------- here
        return inst, err
    }

    dop, err := dagql.NewObjectResultForID[*core.File](f, s.srv, newID)
    if err != nil {
        return inst, err
    }
    return dop, nil
dull portal
onyx tangle
dull portal
#

one sec.... I'll push the version with the eval call

#

it contains a call to panic() whenever it detects a missmatch under: https://github.com/dagger/dagger/pull/11733/changes/9f6b81290247fd9b9fa61d3f5f4cb3409cc5fe3e#diff-a2bab836c6f25d97c19d9ec4f5cbe987a91b5584bfa04e3488e7728d08be10f4R162-R165

which I can then pull up a diff of the schema by running:

docker logs dagger-engine.dev |& grep 'ACB want' > /tmp/want
docker logs dagger-engine.dev |& grep 'ACB got' > /tmp/got
meld <(cat /tmp/got | awk '{print $3}' | base64 -d | jq .) <(cat /tmp/want | awk '{print $3}' | base64 -d | jq .)
#

we're gonna regret it immediately.
๐Ÿ˜‚

onyx tangle
#

(looking at what you pushed now)

dull portal
#

I snuk in a RangeSorted function, so the fields should be produced in a stable order now,

onyx tangle
dull portal
#

./hack/with-dev go test -count=1 -v -run TestModule/TestContextDirectory ./core/integration/ hit it pretty quickly.

onyx tangle
#

I'm profoundly looking forward to the day where we can get rid of those extra layer of cache keys and all we care about is the dagql cache key...

#

but ironically have to get over this hump first ๐Ÿ˜„

#

I'm checking if my theory is correct, just an fyi in case you're looking around still too

onyx tangle
#

eh okay, that may be a factor but not the whole story, still digging

#

seeing if there's a way to fix the thing with the server being constant in the worker

onyx tangle
onyx tangle
#

I think i got something working. I pushed it to your branch as a separate commit.

#

It basically just passes the schema as an arg in the ID. I'm attempting to do it in such a way that the ID w/ the schema attached stays internal only and doesn't cross IPC/network boundaries

#

TestModule was passing for me, we'll see if everything else is okay with it

#

If it works I think that's probably good enough to hold us over. It should become much more trivial once the solver is out of the way

dull portal
#

I see you're just passing the contents along over the args -- if that works and you're not worried about the memory overhead, then maybe that's fine?

The current code has a comment under https://github.com/sipsma/dagger/blob/21f887838c3897cee576f161b795eb027b6ac812/core/schema/query.go#L146-L151

    // LLB marshalling takes up too much memory when file ops have a ton of contents, so we still go through
    // the blob source for now simply to avoid that.

which made it seem like we wanted to avoid passing large data around, but perhaps that's only a LLB limitation and totally fine for dagops? or at least fine for now and we'll just accept the extra dagop serialization overhead?

onyx tangle
dull portal
#

ah, gotcha!