#Hi @jedevc I have a question regarding
1 messages · Page 1 of 1 (latest)
- I have a module A with dagger.json configured with engine version - v0.12.5
- and a module B with dagger.json for it configured with "dev" engine version
- module A depends on module B
- engine version running is "dev" version
what I am seeing is that :
- the generated schema and client code seems like generated as per "dev" version.
- when i call the function using
dagger call, it fails to find function argument and returns error like:
Run 'dagger call inside-old-version --help' for usage.```
the skipTParse format is as per dev format.
is it possible that on the fly when engine generates the code again, it is generating as per the old version? if so, is there a way to look at that generated code somehow (from the engine side)
if you list dagger functions what does it show?
or do --help i guess
just to confirm - is your cli a dev version as well? you can confirm using dagger version
the dagger cli is also served the specific api version that it itself declares (just like a module)
the generated schema and client code seems like generated as per "dev" version
for which module? module a should be according to v0.12.5, module b should be according to dev
Setup tracing at https://dagger.cloud/traces/setup. To hide: export SHUTUP=1
✔ connect 0.0s
✔ initialize 0.4s
Name Description
config -
inside-old-version -```
just to confirm - is your cli a dev version as well? you can confirm using dagger version
it is dev version
dagger v0.12.6-010101000000-dev-9bbc6f44dcdc
(I am trying to debug from inside a container started using dev engine)
the generated schema and client code seems like generated as per "dev" version
for which module? module a should be according to v0.12.5, module b should be according to dev
it seems like it is generated using "dev" for both A and B
Setup tracing at https://dagger.cloud/traces/setup. To hide: export SHUTUP=1
✔ connect 0.0s
✔ initialize 0.4s
✔ prepare 0.0s
USAGE
dagger call inside-old-version [arguments] <function>
ARGUMENTS
--skip-t-parse-old string [required]
^^ this is from module A
hm okay
when I call this function:
Setup tracing at https://dagger.cloud/traces/setup. To hide: export SHUTUP=1
✔ connect 0.0s
✔ initialize 0.4s
✔ prepare 0.0s
✔ oldversion: Oldversion! 0.2s
✘ Oldversion.insideOldVersion(skipTParseOld: "hello"): Container! 0.3s
! call function "InsideOldVersion": process "/runtime" did not complete successfully: exit code: 2
┃ marshal: json: error calling MarshalJSON for type *dagger.Container: input: minimal.withSecondFunction resolve: failed to set call inputs: failed to fin
┃ arg "skipTParse"
✔ minimal: Minimal! 0.2s
✘ Minimal.withSecondFunction(skipTParse: "hello"): Container! 0.0s
! failed to set call inputs: failed to find arg "skipTParse"
Error: response from query: input: oldversion.insideOldVersion resolve: call function "InsideOldVersion": process "/runtime" did not complete successfully: exit code: 2
Stdout:
marshal: json: error calling MarshalJSON for type *dagger.Container: input: minimal.withSecondFunction resolve: failed to set call inputs: failed to find arg "skipTParse"
Run 'dagger call inside-old-version --help' for usage.
do you have module source code?
i'm really sorry, i'm trying to do the release today, so a bit busy
but, i think maybe this is expected and okay
well, not the error or the thing
but module A calling module B, should see the "dev" names for module B (if module B is using dev), regardless of the version that module A is using
this is maybe a little weird, but also it's okay, because it doesn't break the daggerverse
i'm really sorry, i'm trying to do the release today, so a bit busy
sure, I understand. I will keep digging.
right i guess the scenario that matters more is:
- both a and b are on v0.12.5 - both of them should use the old names (in their schema, and the generated code for b in a)
if that makes sense
yeah, that's right, sorry, i think i led you down the wrong path before
this error message is definitely odd though, and potentially points to a bug in the implementation so far
i would look into where the arg matching happens, and see what could be going wrong here - it sounds like maybe something's using the wrong caser instance? or maybe we're looking at OriginalName by accident instead?
does the issue go away if you remove the dependency?
can you replicate it using just dagger query as well?
just some avenues to potentially explore
both a and b are on v0.12.5 - both of them should use the old names (in their schema, and the generated code for b in a)
I will verify that.
but if A is v0.12.5 AND B is "v0.12.6", what would be expected result.
I think our line of changes so far were that when installing on server side:
version A will install as per v0.12.5
version B will install as per v0.12.6
^^ is this the correct assumption
yes, this is the right assumption
okay, here's my understanding of what the issue here is
the key for the map in fn.args is from one strcaser
but then we do a lookup using a diffferent strcaser
these might not match
what i suspect we need to do is not do a map lookup here
we need to iterate over the values, and compare strcase on the arg name (likely against the OriginalName), with strcase on the target input
kind of similar to how moduleDef.GetObject works today?
hmmm, if that is the issue, then FunctionByName and FieldByName also need to take context parameters to do strcase comparisons like that (against the OriginalName)
yeah, okay, i think the above is correct - essentially, the typedef Name maps to how it needs to be installed in the server
however, we can't use this for doing lookups using input names, because Name is already strcased - instead, we need to do lookups using OriginalName using the strcase view of the caller doing the lookup (e.g. Module A calls Module B - Module B installs using version v0.12.5 say, but when Module A calls it, then we need to find it's function/args/etc using the strcaser for A (since the input details are using A)
hopefully that makes some amount of sense
thank you for checking it out. I had to step out for some work, I will look into it with fresh eyes again tomorrow morning. I also wanted to let you know that I really appreciate your guidance so far. It has been quite helpful in navigating the codebase.
no worries, have a good evening 👋
Hi @livid cargo , I have a question about how the call chain work for all the functions:
- moduleA depends on moduleB
- We call
dagger call module-a function-in-a --arg some-arg - Dagger starts the engine if it is not already running (which also installs the basic modules/schema onto the engine)
- Dagger now initializes the moduleA into the engine
- Dagger stores the info about function being called in session (or context)
- Dagger calls the main function of the moduleA, which calls dispatch (in
dagger.gen.go)
(I am assuming above main function is called via /runtime binary compiled during this function call. is this understanding correct)
- The
dispatchfunction calls the server to ask info about what function triggered the previous step. - The
dispatchfunction then calls theinvokefunction, which then creates the object of Module (byjson.Unmarshal) and call the function frommain.go(The code that author of module wrote) - now the function in moduleA makes a call to function in moduleB (via
dag.ModuleB().FunctionB(arg string)) - this in turns calls the function defined in
internal/dagger/dagger.gen.go, which then sends the request to server with name of the function and argument being passed. - server tries to find the module/function for it, and then call it.
I am slightly confused on how this last part works and have few questions:
- when does the engine install the module dependencies? when initializing the main module itself or when the function is called for that dependency
- how does the server runs the function from a dependency? does it do a codegen for it as well, and then compile a separate
/runtimefor it or both of them are in same/runtimebinary? - when the function inside moduleB is called, is this same client session or a new session. my assumption here is that this is new session (or maybe even new engine) that knows only about moduleB and its dependencies. is that assumption correct.
sorry for that wall of text here. i am trying to take a step back to understand different interaction points to make better choices when making code changes
when does the engine install the module dependencies?
we codegen and prep the runtime for all the dependencies before the module we're actually running. that's because to generate thedagger.gen.gofor a module, we need to know all the things it's dependencies declare, so they all need to have run/etc first
so when A calls B, then B is already codegened, and has a runtime that's cached, so we just need to start the WithExec for it
how does the server runs the function from a dependency?
from the point of view of a module's code, there is no "special dependency handling". there's just codegen against a graphql api. so once we've loaded all the dependencies for a module into it's server object, we do the graphql introspection, and see those module apis just sitting there so can codegen them.
when we call them, the request goes back to the engine, and then it calls that function in another module (just as it the main client had called it)
when the function inside moduleB is called, is this same client session or a new session
ah. this is getting a bit tricky from my knowledge, but we treat this as a "nested client session"
it's a different session - so, if you try and do file access through the dagger api or something, it'll all go to the new session.
but it's nested, and i think this maybe has someServiceimplications? but not for the case we're worried about here i don't think.
but the server that the dependency module B is being served only sees B's dependencies - it doesn't see A's
this has a really fun implication that both A and B could even depend on different versions of the same module! it's fine, and not inconsistent, because they have their own server that they see
hopefully that makes sense 🙂 just dumped a ton of info there, fyi, the place to look around for this will in modules.go and modfunc.go and trace around where .Call is called from to see how that all works
It's good to get a handle on this! But also, I think potentially, the only thing wrong in the current strcase implementation is just about the part 11 "server tries to find the module/function for it, and then call it."
yeah, I changed the function to use input.Name directly instead of gqlFormatName(ctx, input.Name) when doing lookup, and IIRC that works. but I wanted to understand other implications of making these changes.
is this flow documented somewhere? If not, I think it will be good to document this as something that could be useful for new contributors. (I am happy to document it as the starting point)
following up on your comments above, I have couple more questions 🙂 :
has a runtime that's cached
could you please point me to where this runtime for dependency is built and cached
then it calls that function in another module (just as it the main client had called it)
so basically calling the /runtime of that dependency module?
yeahhhhh, that's a good point, i'm not actually sure why we normalize the input name - if the input name is wrong (because the caller got it wrong), we shouldn't be doing a weird transformation. if the test suite is fine with it, then so am i 🎉
i think we added a lot of those gql formatting things around everywhere, some of them may/may not be neccessary any more
this is not, no - partly because it's changed a lot, but i think at this point it's probably worth writing down - it would definitely help future sdk authors as well
i think this would definitely be interesting, but let's hold off on doing that right this second? i'd like to get the strcase work wrapped up first
could you please point me to where this runtime for dependency is built and cached
Ah, I think this is theRuntimefunction in the SDK that returns aContaineras the runtime that runs the function. There is no real "explicit" caching for this anywhere, just the caching that dagger has builtin to everything - so if the code stores theRuntimeresult and nothing in it's inputs has changed, then running it again just looks up the last result (I'd avoid diving down this rabbit hole lol, there's dagql server caching you might have seen incachedSelect, but there's also much more complex buildkit-level caching which is way way trickier)
i think this would definitely be interesting, but let's hold off on doing that right this second? i'd like to get the strcase work wrapped up first
yep, that is my priority as well 🙂
so basically calling the /runtime of that dependency module?
exactly, all calls go throughModuleFunction.Call, which is just invoking/runtime(note that the/runtimeis just a go sdk detail - for the python sdk, the entrypoint is gonna be something else, probably something closer topython /path/to/file.py)
👍 , this is useful. thanks again.
i will try to make the change, and verify our tests/compatibility-check etc works
is there any additional combination of engine version testcase that you would recommend? or this (plus other testcases we did earlier) should be enough.
i think just a "before this change version" and an "after this change version"
fyi, we're doing a release today, so the "before version" will now be v0.12.6 😛
just to add some more excitement to your life
ha ha. that should be fine. thanks for the heads up
Hi Justin, I have been continuing to debug this to narrow down the problem. it seems like there is a race condition in setting the ctx (we are currently setting it in Server.cachedSelect. ). I am trying to run dagger call without changing anything, and randomly I am seeing different strcase being used (I can see the content in the file internal/dagger/dagger.gen.go being updated when this happens).
also seeing a weird (but consistent) behavior that when we change the dagger version of main module to old version, it fails the first time, but works on 2nd run.
it might be useful to have a second pair of eyes to identify this issue. would it possible for you to pair with me for few mins on this.
hmmm, i'm busy at least until the release goes out a bit later, i really need to get that done asap
but this sounds like potential caching issues to me, i don't see how there can be a race in cachedSelect
there's a potential issue where a server cache is shared between two servers with different views i think
i would confirm this idea by disabling the server.Cache temporarily and see if the error goes away
if it does - the solution would be to add the view to the cacheMapContextKey and add it into GetOrInitialize
hm, that would be very surprising to me though, since the view is actually encoded into the call id
lemme try and get through the release asap, i'll see if i can grab some time with you after that
otherwise, if you run out of time, feel free to push your branch (and chuck some instructions to repro, and i'll have a look)
btw, if you're getting very blocked on this, feel free to pick up another task in the meantime? something like the sdk error message we discussed on github previously
ha ha. yeah, i can do that. but this feels so close (yet so far). FWIW, the issue we discussed yesterday works now (the argument name), right now its the function name that I am working on (added an additional testcase for that)
I pushed the code with which I am trying to reproduce the prob here: https://github.com/rajatjindal/dagger-views with readme about how to reproduce the issue.
basically I am down to trying the permutation of engine version + (fn name, argument name, module name) to verify where it fails. and ran into the above problem when dependency version is <= v0.12.5 (current cutoff for strcase changes)
I have identified a few places where engine version is not passed, and that might be causing the prob. (I hardcoded the engine version at those places, and it seems to be working after that.). that code is pushed to branch "rajatjindal/dagger:strcase-testcases"
it will be great to pair with you next week to confirm if those fixes make sense.
Hi @livid cargo, if you have some time today, it will be great to pair on this one. I think (and I hope) I have most of the places figured out where ctx is not passed as expected, but I need some help to figure out how to pass the ctx there.
oh just saw it. coming in there in a min
hm okay, so i have a fun theory here
I think we're getting thrown off by the fact that dagger develop first updates the engineVersion field to the latest version - so it updates to the most recent, and does the most recent strcase implementation always
so that explains why we're using the most recent one
something that is weird is that we don't actually expect the value to change - since it should be dictated by the dependency
actually, i did used --compat flag (may have missed that in repro instructions)...
because without that it was consistent and changing the dagger engine version in dagger.json.
(i was tracking it using git repo to be able to point out what was changing with each execution 🙂 )
(and was keeping track of dagger.gen.go manually as they are .gitignored otherwise)
to be double sure, i was also stopping engine, removing all containers and images, and re-running from scratch (but I was not sure if this is enough for cleaning up the cache)
mmm you're right
i can repro without this
i have detected a difference in some internal caching in ModDeps.lazilyLoadSchema, I'm currently investigating that
that would explain the issue i think
🙏 thank you so much for your help Justin.
ohhh okay i have a hunch here
typedef operations can't be cached in the same way as before right?
because now it depends on who calls those functions
that's definitely something that could be going on
but might not be the thing that's going wrong
hm, no even with that change there's still an issue
I was working on a theory that:
if A calls B, then:
B/dagger.gen.goshould see its functions according to its (B's) engine version andA/internal/dagger/dagger.gen.go- should see B's function according to engine version of A
we may want to try this WITHOUT any strcase changes. because if this caching behavior is the problem, we should see it without it as well?
we just need to find a testcase to easily reproduce the problem (in this case it is the strcase lib, but I am hoping it could be done with something else as well)
right
ffs
found it
diff --git a/dagql/server.go b/dagql/server.go
index ac7e175e2..888a7c838 100644
--- a/dagql/server.go
+++ b/dagql/server.go
@@ -572,7 +572,7 @@ func CurrentID(ctx context.Context) *call.ID {
func NoopDone(res Typed, cached bool, rerr error) {}
func (s *Server) cachedSelect(ctx context.Context, self Object, sel Selector) (res Typed, chained *call.ID, rerr error) {
- ctx = compat.AddCompatToContext(ctx, s.View)
+ ctx = compat.AddCompatToContext(ctx, sel.View)
chainedID, err := self.IDFor(ctx, sel)
if err != nil {
return nil, nil, err
I'd forgotten this particular intricacy
essentially, what's happening is an id is getting cached
but the problem is that then when we come to load the id, we're accidentally using the View from the server, not from the id - and since id's can be loaded in servers with different views they were created in, this can cause incorrectness
got another patch you'll want as well:
this could have caused some weirdness with accidentally caching the results from another typedef
can confirm that with these changes at least, things start to work a bit more reasonably again
what an absolute pain of a bug to work through
hopefully this unblocks you btw - i need to move onto some other stuff, but should hopefully mean that we can at least test this
Awesome, thank you so much. I will try this out, really appreciate the help
Hi Justin, I tried above, but it seems like the View is empty in some places where we would want it to be set to the version of module. e.g.
dagql.Selection{
Alias:"withDaggerCLIAlpine",
Selector:dagql.Selector{
Field:"withDaggerCLIAlpine",
Args:[]dagql.NamedInput{
dagql.NamedInput{Name:"stringArg", Value:"dev"}
},
Nth:0,
View:""
},
Subselections:[]dagql.Selection{
dagql.Selection{
Alias:"stdout",
Selector:dagql.Selector{
Field:"stdout",
Args:[]dagql.NamedInput{},
Nth:0,
View:"v0.12.8"
},
Subselections:[]dagql.Selection(nil)
}
}
}
}```
for above, view is empty for function/arg.
it almost seems like this will need some more dedicated time from you to take it over the finish line. maybe after v0.13.0 is released?
yeah potentially, i'm completely out of bandwidth right now until the release
sounds good. I will update Marcos that I am deferring work on this ticket.
just a curious question: are we expecting v0.13.0 to have breaking changes? if so, can this change land as a breaking change?
yes, we can have breaking changes, but we still need version compatibility for those
we need to avoid breaking modules in the daggerverse
👍
I don't actually think this case is a problem
Note that user defined functions do not get set views
Only functions that have been annotated in dagql with .View are going to have that selector set
The cases we care about are .withFunction and similar
I think one of the mistake I might be doing is to overthink on the format that graph query must use. I should instead check that I am able to call the functions correctly with args? if the function call works with old and new engine version, we should be good
I am currently working on this ticket https://github.com/dagger/dagger/issues/6713. once I finish this one, I will revisit the strcase ticket again to take a fresh look