#Hi @jedevc I have a question regarding

1 messages · Page 1 of 1 (latest)

unborn kiln
#
  • I have a module A with dagger.json configured with engine version - v0.12.5
  • and a module B with dagger.json for it configured with "dev" engine version
  • module A depends on module B
  • engine version running is "dev" version

what I am seeing is that :

  • the generated schema and client code seems like generated as per "dev" version.
  • when i call the function using dagger call, it fails to find function argument and returns error like:
Run 'dagger call inside-old-version --help' for usage.```
#

the skipTParse format is as per dev format.

#

is it possible that on the fly when engine generates the code again, it is generating as per the old version? if so, is there a way to look at that generated code somehow (from the engine side)

livid cargo
#

if you list dagger functions what does it show?

#

or do --help i guess

#

just to confirm - is your cli a dev version as well? you can confirm using dagger version

#

the dagger cli is also served the specific api version that it itself declares (just like a module)

#

the generated schema and client code seems like generated as per "dev" version
for which module? module a should be according to v0.12.5, module b should be according to dev

unborn kiln
#
Setup tracing at https://dagger.cloud/traces/setup. To hide: export SHUTUP=1

✔ connect 0.0s
✔ initialize 0.4s

Name                 Description
config               -
inside-old-version   -```
#

just to confirm - is your cli a dev version as well? you can confirm using dagger version

it is dev version

dagger v0.12.6-010101000000-dev-9bbc6f44dcdc

(I am trying to debug from inside a container started using dev engine)

#

the generated schema and client code seems like generated as per "dev" version

for which module? module a should be according to v0.12.5, module b should be according to dev

it seems like it is generated using "dev" for both A and B

#
Setup tracing at https://dagger.cloud/traces/setup. To hide: export SHUTUP=1

✔ connect 0.0s
✔ initialize 0.4s
✔ prepare 0.0s

USAGE
  dagger call inside-old-version [arguments] <function>

ARGUMENTS
      --skip-t-parse-old string   [required]
#

^^ this is from module A

livid cargo
#

hm okay

unborn kiln
#

when I call this function:

Setup tracing at https://dagger.cloud/traces/setup. To hide: export SHUTUP=1

✔ connect 0.0s
✔ initialize 0.4s
✔ prepare 0.0s
✔ oldversion: Oldversion! 0.2s
✘ Oldversion.insideOldVersion(skipTParseOld: "hello"): Container! 0.3s
! call function "InsideOldVersion": process "/runtime" did not complete successfully: exit code: 2
┃ marshal: json: error calling MarshalJSON for type *dagger.Container: input: minimal.withSecondFunction resolve: failed to set call inputs: failed to fin
┃ arg "skipTParse"                                                                                                                                        
  ✔ minimal: Minimal! 0.2s
  ✘ Minimal.withSecondFunction(skipTParse: "hello"): Container! 0.0s
  ! failed to set call inputs: failed to find arg "skipTParse"

Error: response from query: input: oldversion.insideOldVersion resolve: call function "InsideOldVersion": process "/runtime" did not complete successfully: exit code: 2

Stdout:
marshal: json: error calling MarshalJSON for type *dagger.Container: input: minimal.withSecondFunction resolve: failed to set call inputs: failed to find arg "skipTParse"
Run 'dagger call inside-old-version --help' for usage.
livid cargo
#

do you have module source code?

#

i'm really sorry, i'm trying to do the release today, so a bit busy

#

but, i think maybe this is expected and okay

#

well, not the error or the thing

#

but module A calling module B, should see the "dev" names for module B (if module B is using dev), regardless of the version that module A is using

#

this is maybe a little weird, but also it's okay, because it doesn't break the daggerverse

unborn kiln
#

i'm really sorry, i'm trying to do the release today, so a bit busy

sure, I understand. I will keep digging.

livid cargo
#

right i guess the scenario that matters more is:

  • both a and b are on v0.12.5 - both of them should use the old names (in their schema, and the generated code for b in a)
#

if that makes sense

#

yeah, that's right, sorry, i think i led you down the wrong path before

livid cargo
#

i would look into where the arg matching happens, and see what could be going wrong here - it sounds like maybe something's using the wrong caser instance? or maybe we're looking at OriginalName by accident instead?

#

does the issue go away if you remove the dependency?
can you replicate it using just dagger query as well?

#

just some avenues to potentially explore

unborn kiln
#

both a and b are on v0.12.5 - both of them should use the old names (in their schema, and the generated code for b in a)

I will verify that.

#

but if A is v0.12.5 AND B is "v0.12.6", what would be expected result.

I think our line of changes so far were that when installing on server side:

version A will install as per v0.12.5
version B will install as per v0.12.6

^^ is this the correct assumption

livid cargo
#

yes, this is the right assumption

livid cargo
#

the key for the map in fn.args is from one strcaser
but then we do a lookup using a diffferent strcaser

#

these might not match

#

what i suspect we need to do is not do a map lookup here

#

we need to iterate over the values, and compare strcase on the arg name (likely against the OriginalName), with strcase on the target input

#

kind of similar to how moduleDef.GetObject works today?

#

hmmm, if that is the issue, then FunctionByName and FieldByName also need to take context parameters to do strcase comparisons like that (against the OriginalName)

#

yeah, okay, i think the above is correct - essentially, the typedef Name maps to how it needs to be installed in the server
however, we can't use this for doing lookups using input names, because Name is already strcased - instead, we need to do lookups using OriginalName using the strcase view of the caller doing the lookup (e.g. Module A calls Module B - Module B installs using version v0.12.5 say, but when Module A calls it, then we need to find it's function/args/etc using the strcaser for A (since the input details are using A)

#

hopefully that makes some amount of sense

unborn kiln
#

thank you for checking it out. I had to step out for some work, I will look into it with fresh eyes again tomorrow morning. I also wanted to let you know that I really appreciate your guidance so far. It has been quite helpful in navigating the codebase.

livid cargo
#

no worries, have a good evening 👋salute

unborn kiln
#

Hi @livid cargo , I have a question about how the call chain work for all the functions:

  1. moduleA depends on moduleB
  2. We call dagger call module-a function-in-a --arg some-arg
  3. Dagger starts the engine if it is not already running (which also installs the basic modules/schema onto the engine)
  4. Dagger now initializes the moduleA into the engine
  5. Dagger stores the info about function being called in session (or context)
  6. Dagger calls the main function of the moduleA, which calls dispatch (in dagger.gen.go)

(I am assuming above main function is called via /runtime binary compiled during this function call. is this understanding correct)

  1. The dispatch function calls the server to ask info about what function triggered the previous step.
  2. The dispatch function then calls the invoke function, which then creates the object of Module (by json.Unmarshal) and call the function from main.go (The code that author of module wrote)
  3. now the function in moduleA makes a call to function in moduleB (via dag.ModuleB().FunctionB(arg string))
  4. this in turns calls the function defined in internal/dagger/dagger.gen.go, which then sends the request to server with name of the function and argument being passed.
  5. server tries to find the module/function for it, and then call it.

I am slightly confused on how this last part works and have few questions:

  • when does the engine install the module dependencies? when initializing the main module itself or when the function is called for that dependency
  • how does the server runs the function from a dependency? does it do a codegen for it as well, and then compile a separate /runtime for it or both of them are in same /runtime binary?
  • when the function inside moduleB is called, is this same client session or a new session. my assumption here is that this is new session (or maybe even new engine) that knows only about moduleB and its dependencies. is that assumption correct.
#

sorry for that wall of text here. i am trying to take a step back to understand different interaction points to make better choices when making code changes

livid cargo
#

when does the engine install the module dependencies?
we codegen and prep the runtime for all the dependencies before the module we're actually running. that's because to generate the dagger.gen.go for a module, we need to know all the things it's dependencies declare, so they all need to have run/etc first

#

so when A calls B, then B is already codegened, and has a runtime that's cached, so we just need to start the WithExec for it

#

how does the server runs the function from a dependency?
from the point of view of a module's code, there is no "special dependency handling". there's just codegen against a graphql api. so once we've loaded all the dependencies for a module into it's server object, we do the graphql introspection, and see those module apis just sitting there so can codegen them.
when we call them, the request goes back to the engine, and then it calls that function in another module (just as it the main client had called it)

#

when the function inside moduleB is called, is this same client session or a new session
ah. this is getting a bit tricky from my knowledge, but we treat this as a "nested client session"
it's a different session - so, if you try and do file access through the dagger api or something, it'll all go to the new session.
but it's nested, and i think this maybe has some Service implications? but not for the case we're worried about here i don't think.

#

but the server that the dependency module B is being served only sees B's dependencies - it doesn't see A's

#

this has a really fun implication that both A and B could even depend on different versions of the same module! it's fine, and not inconsistent, because they have their own server that they see

#

hopefully that makes sense 🙂 just dumped a ton of info there, fyi, the place to look around for this will in modules.go and modfunc.go and trace around where .Call is called from to see how that all works

#

It's good to get a handle on this! But also, I think potentially, the only thing wrong in the current strcase implementation is just about the part 11 "server tries to find the module/function for it, and then call it."

unborn kiln
#

yeah, I changed the function to use input.Name directly instead of gqlFormatName(ctx, input.Name) when doing lookup, and IIRC that works. but I wanted to understand other implications of making these changes.

#

is this flow documented somewhere? If not, I think it will be good to document this as something that could be useful for new contributors. (I am happy to document it as the starting point)

#

following up on your comments above, I have couple more questions 🙂 :

has a runtime that's cached

could you please point me to where this runtime for dependency is built and cached

then it calls that function in another module (just as it the main client had called it)

so basically calling the /runtime of that dependency module?

livid cargo
#

i think we added a lot of those gql formatting things around everywhere, some of them may/may not be neccessary any more

livid cargo
#

i think this would definitely be interesting, but let's hold off on doing that right this second? i'd like to get the strcase work wrapped up first

livid cargo
# unborn kiln following up on your comments above, I have couple more questions 🙂 : > has a...

could you please point me to where this runtime for dependency is built and cached
Ah, I think this is the Runtime function in the SDK that returns a Container as the runtime that runs the function. There is no real "explicit" caching for this anywhere, just the caching that dagger has builtin to everything - so if the code stores the Runtime result and nothing in it's inputs has changed, then running it again just looks up the last result (I'd avoid diving down this rabbit hole lol, there's dagql server caching you might have seen in cachedSelect, but there's also much more complex buildkit-level caching which is way way trickier)

unborn kiln
#

i think this would definitely be interesting, but let's hold off on doing that right this second? i'd like to get the strcase work wrapped up first

yep, that is my priority as well 🙂

livid cargo
#

so basically calling the /runtime of that dependency module?
exactly, all calls go through ModuleFunction.Call, which is just invoking /runtime (note that the /runtime is just a go sdk detail - for the python sdk, the entrypoint is gonna be something else, probably something closer to python /path/to/file.py)

unborn kiln
#

👍 , this is useful. thanks again.

i will try to make the change, and verify our tests/compatibility-check etc works

#

is there any additional combination of engine version testcase that you would recommend? or this (plus other testcases we did earlier) should be enough.

livid cargo
#

i think just a "before this change version" and an "after this change version"

#

fyi, we're doing a release today, so the "before version" will now be v0.12.6 😛

#

just to add some more excitement to your life

unborn kiln
#

ha ha. that should be fine. thanks for the heads up

unborn kiln
#

Hi Justin, I have been continuing to debug this to narrow down the problem. it seems like there is a race condition in setting the ctx (we are currently setting it in Server.cachedSelect. ). I am trying to run dagger call without changing anything, and randomly I am seeing different strcase being used (I can see the content in the file internal/dagger/dagger.gen.go being updated when this happens).

#

also seeing a weird (but consistent) behavior that when we change the dagger version of main module to old version, it fails the first time, but works on 2nd run.

#

it might be useful to have a second pair of eyes to identify this issue. would it possible for you to pair with me for few mins on this.

livid cargo
#

hmmm, i'm busy at least until the release goes out a bit later, i really need to get that done asap

#

but this sounds like potential caching issues to me, i don't see how there can be a race in cachedSelect

#

there's a potential issue where a server cache is shared between two servers with different views i think

#

i would confirm this idea by disabling the server.Cache temporarily and see if the error goes away

#

if it does - the solution would be to add the view to the cacheMapContextKey and add it into GetOrInitialize

#

hm, that would be very surprising to me though, since the view is actually encoded into the call id

#

lemme try and get through the release asap, i'll see if i can grab some time with you after that
otherwise, if you run out of time, feel free to push your branch (and chuck some instructions to repro, and i'll have a look)

livid cargo
#

btw, if you're getting very blocked on this, feel free to pick up another task in the meantime? something like the sdk error message we discussed on github previously

unborn kiln
#

ha ha. yeah, i can do that. but this feels so close (yet so far). FWIW, the issue we discussed yesterday works now (the argument name), right now its the function name that I am working on (added an additional testcase for that)

unborn kiln
#

basically I am down to trying the permutation of engine version + (fn name, argument name, module name) to verify where it fails. and ran into the above problem when dependency version is <= v0.12.5 (current cutoff for strcase changes)

unborn kiln
#

I have identified a few places where engine version is not passed, and that might be causing the prob. (I hardcoded the engine version at those places, and it seems to be working after that.). that code is pushed to branch "rajatjindal/dagger:strcase-testcases"

unborn kiln
#

it will be great to pair with you next week to confirm if those fixes make sense.

unborn kiln
#

Hi @livid cargo, if you have some time today, it will be great to pair on this one. I think (and I hope) I have most of the places figured out where ctx is not passed as expected, but I need some help to figure out how to pass the ctx there.

livid cargo
#

yeah i can do right now

unborn kiln
#

oh just saw it. coming in there in a min

livid cargo
#

I think we're getting thrown off by the fact that dagger develop first updates the engineVersion field to the latest version - so it updates to the most recent, and does the most recent strcase implementation always

#

so that explains why we're using the most recent one

#

something that is weird is that we don't actually expect the value to change - since it should be dictated by the dependency

unborn kiln
#

actually, i did used --compat flag (may have missed that in repro instructions)...

#

because without that it was consistent and changing the dagger engine version in dagger.json.

#

(i was tracking it using git repo to be able to point out what was changing with each execution 🙂 )

#

(and was keeping track of dagger.gen.go manually as they are .gitignored otherwise)

#

to be double sure, i was also stopping engine, removing all containers and images, and re-running from scratch (but I was not sure if this is enough for cleaning up the cache)

livid cargo
#

mmm you're right

#

i can repro without this

#

i have detected a difference in some internal caching in ModDeps.lazilyLoadSchema, I'm currently investigating that

#

that would explain the issue i think

unborn kiln
#

🙏 thank you so much for your help Justin.

livid cargo
#

ohhh okay i have a hunch here

#

typedef operations can't be cached in the same way as before right?

#

because now it depends on who calls those functions

#

that's definitely something that could be going on

#

but might not be the thing that's going wrong

#

hm, no even with that change there's still an issue

unborn kiln
#

I was working on a theory that:

if A calls B, then:

  • B/dagger.gen.go should see its functions according to its (B's) engine version and
  • A/internal/dagger/dagger.gen.go - should see B's function according to engine version of A
#

we may want to try this WITHOUT any strcase changes. because if this caching behavior is the problem, we should see it without it as well?

#

we just need to find a testcase to easily reproduce the problem (in this case it is the strcase lib, but I am hoping it could be done with something else as well)

livid cargo
#

right

#

ffs

#

found it

#
diff --git a/dagql/server.go b/dagql/server.go
index ac7e175e2..888a7c838 100644
--- a/dagql/server.go
+++ b/dagql/server.go
@@ -572,7 +572,7 @@ func CurrentID(ctx context.Context) *call.ID {
 func NoopDone(res Typed, cached bool, rerr error) {}
 
 func (s *Server) cachedSelect(ctx context.Context, self Object, sel Selector) (res Typed, chained *call.ID, rerr error) {
-    ctx = compat.AddCompatToContext(ctx, s.View)
+    ctx = compat.AddCompatToContext(ctx, sel.View)
     chainedID, err := self.IDFor(ctx, sel)
     if err != nil {
         return nil, nil, err
#

I'd forgotten this particular intricacy

#

essentially, what's happening is an id is getting cached

#

but the problem is that then when we come to load the id, we're accidentally using the View from the server, not from the id - and since id's can be loaded in servers with different views they were created in, this can cause incorrectness

#

got another patch you'll want as well:

#

this could have caused some weirdness with accidentally caching the results from another typedef

#

can confirm that with these changes at least, things start to work a bit more reasonably again

#

what an absolute pain of a bug to work through

#

hopefully this unblocks you btw - i need to move onto some other stuff, but should hopefully mean that we can at least test this

unborn kiln
#

Awesome, thank you so much. I will try this out, really appreciate the help

unborn kiln
#

Hi Justin, I tried above, but it seems like the View is empty in some places where we would want it to be set to the version of module. e.g.

    dagql.Selection{
        Alias:"withDaggerCLIAlpine", 
        Selector:dagql.Selector{
            Field:"withDaggerCLIAlpine", 
            Args:[]dagql.NamedInput{
                dagql.NamedInput{Name:"stringArg", Value:"dev"}
            }, 
            Nth:0, 
            View:""
        }, 
        Subselections:[]dagql.Selection{
            dagql.Selection{
                Alias:"stdout", 
                Selector:dagql.Selector{
                    Field:"stdout", 
                    Args:[]dagql.NamedInput{}, 
                    Nth:0, 
                    View:"v0.12.8"
                }, 
                Subselections:[]dagql.Selection(nil)
            }
        }
    }
}```

for above, view is empty for function/arg.
#

it almost seems like this will need some more dedicated time from you to take it over the finish line. maybe after v0.13.0 is released?

livid cargo
#

yeah potentially, i'm completely out of bandwidth right now until the release

unborn kiln
#

sounds good. I will update Marcos that I am deferring work on this ticket.

just a curious question: are we expecting v0.13.0 to have breaking changes? if so, can this change land as a breaking change?

livid cargo
#

yes, we can have breaking changes, but we still need version compatibility for those

#

we need to avoid breaking modules in the daggerverse

unborn kiln
#

👍

livid cargo
#

Note that user defined functions do not get set views

#

Only functions that have been annotated in dagql with .View are going to have that selector set

#

The cases we care about are .withFunction and similar

unborn kiln
#

I think one of the mistake I might be doing is to overthink on the format that graph query must use. I should instead check that I am able to call the functions correctly with args? if the function call works with old and new engine version, we should be good

livid cargo
#

👍

#

fyi, i'm off monday+tuesday next week, so won't be around