#I'll echo @jedevc 's surprise that that'

1 messages · Page 1 of 1 (latest)

limber swan
#

It's certainly worth discussing to make sure we get it right. In retrospect I regret not double-checking that we were all on the same page on which context we were talking about.

#

It's certainly worth discussing to make sure we get it right. In retrospect I regret not double-checking that we were all on the same page on which context we were talking about.

#

Taking it from the very top, do we agree there are 3 4 possible choices on the table:

  1. Callee module context
  2. Immediate caller module context
  3. Top-level module context
  4. Client workdir?
#

My assumption was that we all wanted option 3: top-level module context

#

But now seems like a good time to revisit that assumption 🙂

#

(adding @weary knoll )

knotty fjord
#

Let me test my interpretation of 1, since I think I think I can imagine a use case for it: imagine you're consuming an application like Bass in your module, and you just want to do something like dag.Bass().Binary() to mount its binary into a container. I would expect that to load up the Bass context dir for it to build its binary, and for it to not have anything to do with my own module's context dir

Does that track at all with this discussion, or is that a simpler case?

weary knoll
#

Right now, for a top level module we can either get:

  • The context directory (.git pwd)
  • The root dir (dagger.json pwd)

For dependency that use context dir, I would assume they can get the same context as their parent.

weary knoll
#

But how do we make a difference between them?

knotty fjord
# weary knoll But how do we make a difference between them?

Depends on if we also have use cases for 2/3/4 I think - if it turns out 1 is the only practical use case for example, I guess defaultPath in each module is always just resolved relative to that module's context dir

My main concern with dependent modules gaining access to 2/3/4 is it seems like pretty tight coupling - these path specifications tend to be acutely aware of repo layout etc., and maybe there are legit use cases for that (generic tooling for common paths like .github), but it feels more brittle

I would suggest confirming whether my description of 1 is accurate, and then trying to come up with use cases for 2/3/4 to fully scope this thing out

limber swan
knotty fjord
limber swan
#

Well in some cases, the callee module context is also the caller module context. So a lot of this depends on project layout

#

Specifically what happens if I call an external module in the context of my app repo

#
  • dagger call build -> all good, callee=caller
  • dagger call -m github.com/sre-team/platform build -> oh shit which context do I use
knotty fjord
#

ah right - the reusable toolchains case?

limber swan
#

Not to mention the inverted control flow which we're discussing with Fastly, where the "platform" module would import the app module instead of the other way around

#

OK just to be clear, I am contemplating the possibility that I was wrong and that the callee context is actually fine

#

I'm just scared that we get it wrong and cross a horribly one-way door

#

Not just a door but one of these parking lot spiked barriers:

weary knoll
#

I'm still getting a scratch dir with callee context, I need to find out why then

knotty fjord
#

Some strawmen:

  1. Module author can specify path source (callee, caller, user, local, etc) - probably confusing? But maybe only toolchain modules need to think about this?
  2. When you install a module you can override its context dir (use this for toolchain modules). But what if the module needs both?
limber swan
#

Yeah I think this boils down to toolchain modules (in other words the pattern of dagger call -m FOO from the root of your app)

#

I can't think of anything else that "callee context" breaks (should double-check that)

#

So one possible option is we just kill the toolchain module use case

knotty fjord
#

I do think it's a valid and interesting use case, but yeah cutting scope has helped before, and maybe there's a path to it in the future when we can focus solely on it

#

It doesn't seem possible to satisfy both at once without a way to distinguish

weary knoll
#

We could add a context-type metadata to make that happen

limber swan
#

OK I don't want to hold this up. Let's assume we're giving up on the "toolchain modules", at least for now. So let's make the implementation work with "callee context" (so the opposite of what I was saying before), and see how it goes. If you can all keep the "contextual module use case" on your radar, so we watch for the other end of the tradeoff, I would appreciate it.

Concrete consequence of this tradeoff: it's no longer a good idea (probably) to have your app repos run dagger call -m github.com/acme/platform webapp build or similar. That only affects larger organizations, and there are other patterns being considered. So we'll see if it becomes a problem.

limber swan
dapper hound
#

we also need to make sure we have tests for the above as well!

if we can, i'd like to also have a test for Directory.AsModule() and see what happens with that as it loads a context directory, since this is another related codepath to the above

weary knoll
#

@dapper hound Hey, do you know if there's a way from *ModuleSource to know if it's a loaded as a dependency or from the host?
This would be super helpful for the implementation because:

  • If it's local soure -> I resolve it the current way
  • If it's a dependency -> I call ContextDirectory() method, and get the directory at the sourceRootPath (callee context)

However, I don't know how to make the difference for now, I could do it by a check on resolveContextPathFromModule: if it returns /scratch, that means it's a module loaded a dependency but that doesn't look right

#

I kinda found a way, not great but it works

#
    // If relHostPath is empty, it means the module source is called as a dependecy of another module.
    // So we scope the context to the module root directory (callee's context).
    if relHostPath == "" {
        return src.AsLocalSource.Value.RootSubpath, src.AsLocalSource.Value.RootSubpath, true, nil
    }
#

Lol

#

With a parameter isDep set to true of false if relHostPath is empty

#

@dapper hound In case it's a dep module, I can retrieve the ContextDir, meaning that I can also get up to the parent module's

In that case, should I point to this one if the user set a /? And the dep module's if it's relative with . ?

#

For example

test/ <- context dir (in case of dep module)
  dagger.json
  dep/ <- root dir (in case of dep module)
    dagger.json
dapper hound
#

i don't think we can use ContextDirectory

#

we had to build LoadContext so as to avoid this for local sources

weary knoll
#

I tried to add a file foo.txt inside the module and it worked

dapper hound
#

because ContextDirectory has all the filters/etc from the module sdk right?

weary knoll
#

Oh yeah, that's true..

#

I forget about that one...

#

But then what can I do? Because I cannot use buildkit.LocalImport either

dapper hound
#

i'm confused why not

#

oh i guess i see

weary knoll
#

LocalImport check on the host right? But from a depmodule, there's no host

dapper hound
#

well i guess we need localimport for context directories to import all the way from the host

#

but, we need to restrict the paths to read

#

and get the subpath from the host

#

actually not the host, the top-most place where the local directory was defined? maybe?

#

hm i guess a ModuleSource needs to know which session it comes from 🤔

weary knoll
#

Hm

#

That's a lot of guess there haha

dapper hound
weary knoll
#

Hmm how can I do that?

dapper hound
#

good question, it also really screws up caching 😛

weary knoll
#

So hang on, that means the session is different for each module?

dapper hound
#

even though it's a local source, it doesn't have a local fs to pull from

weary knoll
#

Okay

#

And then?

dapper hound
#

yeah, i'm drawing blanks here, one moment

weary knoll
#

Oh wait, I got something interesting

#

This is what I got when I do LocalImport with a path to /

dapper hound
weary knoll
#

I can go in /src/src and there I can get the context dir

#

A LocalSource should track where the context directory it's been loaded from is. This removes the complexity of this specific dance.
I think I fixed that with RelHostPath

#

We should create a ScratchSource type that can be created for Directory.AsModule - this type doesn't map to a directory on the host, so we should avoid all chances of confusion, which this PR steps into.
I don't get it

dapper hound
# weary knoll I can go in `/src/src` and there I can get the context dir

right, a couple things to remember:

  • a local module can have local deps (which need resolving on the host)
  • a git module can also have local deps (which need resolving relative to the git directory, not the host)
  • any module can load a module using AsDirectory (which needs resolving relative to that directory, not the host)
weary knoll
dapper hound
weary knoll
#

I see

#

So how can I solve that?

#

Btw, for git module, I didn't did any tests but I made a difference in LoadContext between the 2

dapper hound
#

actuallllly, i think point 2 is maybe not an issue, when we load dependencies, we do the translation to Git sources on the fly

#

so that's probably fine

#

point 3 is still an issue, but bleh

#

sorry, live thinking through this right now

weary knoll
#

No problem, I know what it is 🤣

dapper hound
#

hm, thinking through this, i think there's a subtle caching issue here that deserves a moment

#

suppose - we have two clients - e.g. in two different modules, they have exactly the same ContextDirectory, but they have some minor file difference (in an excluded file or something)

#

now, today, we can safely just combine these together

#

they're the same, they have the same id

#

but now this is an issue - if we do LoadContext on both of them, they have different results

#

so this is also solved by having LocalDirectorySource having a ClientID in it that indicates who loaded it

#

but this also removes any legitimate caching cases, which feels like it would slow down our tests, and users reloading modules

#

but i'm also struggling to see what the alternative is if we want to do lazy loading of module context directory data at all

weary knoll
#

Is there a chance that we merge the context dir PR only working with caller context, add a big warning on a doc page to say (it's still in progress) (or like a flag or something in the CLI to enable it) and discuss that altogether in a follow up?

I feel the more we dig into context dir, the larger it gets with impact all over the engine and design decision that we're not sure about right now

dapper hound
#

yeah, was thinking somewhat similarly - could we explicitly disable it for module dependencies? and only have it for the top-most module?

weary knoll
#

Yes I can

#

Should I throw an error if it's called over a dependency?

dapper hound
#

i dunno if that effectively makes it useless for @limber swan

weary knoll
#

I don't know, but I seems much more complex that is was originally, and might break more things than actually solving a problem lmao

#

Maybe going a bit slower, merging step by step is the right decision, I don't know

#

@shut condor I think your opinion can help too, there's a lot of context to digest but if you got some time, that would be amazing 🚀

dapper hound
limber swan
#

👋 the feature is not finished without dependencies I think, but whether to do an intermediary merge is an implementation issue, it depends if it helps you get to the finish line faster and more reliably

dapper hound
limber swan
#

I don't know if we'll want to advertise the feature yet, need to think about that

#

I do worry a little about how impossible this feature feels. It's the first time this happens, it smells like technical debt and makes me wonder if I've been underestimating how much we have

weary knoll
#

Maybe it's a sign we need to fix some internal core issues before building more?

dapper hound
weary knoll
#

The PR fixes one by keeping the host path the module is loaded from, which was a big blocker

#

The other right now is, how to access an unfiltered context dir from a dep

#

Maybe another way around would be to have both filered and unfiltered context dir?

#

Is this something possible?

#

I guess not, it would not be efficient.. we need to load it on demand...

dapper hound
#

yeah, was thinking about this just now too 😄

#

um, yeah, the problem feels like:

  • either, there is no laziness - a module source contains everything it will ever need to container
  • or, we have laziness, but then everything needs to accurately track exactly where it came from, otherwise we can't find it
weary knoll
#

Is there a way to access the host from a dep? We did it with the Host API no? So it might be possible?

dapper hound
#

since not all modules are loaded from the host

weary knoll
#

For git it's a bit different, we have to clone the repo to get it, so by default we fetch everything

#

which is equivalent to "no laziness"

dapper hound
#

yeah, that's the easy case 😄

#

"easy case"

#

yeah okay, i'm gonna writeup something on github, with some more structure from this conversation so we can catch up other folks on it

weary knoll
#

Thanks a lot!

dapper hound
#

i feel like we can do this, but both ways of resolving the issue feel tricky, and i think we need to make sure with other people that we make the right clal

#

i think i should have realized this issue earlier, i think i voiced it wrong when we started

weary knoll
#

No problem! You're doing great!

limber swan
#

ok thanks guys, keep me posted please

limber swan
dapper hound
#

@knotty fjord @shut condor if you're around to take a look, that would help with unblocking 😦 maybe i'm overthinking this, for some reason, i'm struggling with understanding this feature 😢

#

re-reading some old comments, i think this is maybe what erik was thinking this might look like

#

we can keep this moving forwards, even if we're not sure where the id comes from

#

here's what needs doing @weary knoll

  • modify LocalModuleSource so that it has a CallerClientID field - this can be populated during ModuleSource.ResolveFromCaller
  • during LoadContext, get that ID from the ModuleSource, and pass it into LocalImport - you'll have to modify the signature of it (as suggested by erik here: https://github.com/dagger/dagger/pull/7744#issuecomment-2189738829)
    • if there is no ID! then we can fallback to ContextDirectory
  • we need tests for this case. we don't need all languages, just go-to-go is fine. we need to make sure that calling modules like this works.
    • a test for a module calling a module that uses a context directory like this
    • a test for a module calling a module, that is loaded from AsModule
      that should do it
#

we might come back and change the first point, but. we can fix that later, it'll be easy.

dapper hound
limber swan
#

It's funny that you reference Directory.asModule in your writeup @dapper hound, I always felt like that should be called Directory.asContext instead, since that directory is not actually being used as a module

#

Directory.asContext().loadModule("./foo")

#

Probably immaterial to this implementation issue, but thought I'd mention it. Funny that sometimes API design smells, point to the same place where you find code smell later

dapper hound
#

yeah, i feel like i would like to refactor every part of module loading 🙂

#

there's just so many moving parts, it's hard to understand what goes where

dapper hound
limber swan
#

I just want to triple-check that the writeup is about the implementation issues and not about accomodating my earlier request to change from callee context to caller context or top-level context. Like I said before I can live with the tradeoff of killing the "toolchain modules" use case for now, to remove constraints for implementation and get the thing shipped. Just double-checking that this gets through clearly

dapper hound
#

yeah, this issue was technically present from the first implementation of this

#

the current implementation isn't doing callee or caller context, it's getting the contents of the caller's scratch directory - which definitely isn't right

limber swan
#

What I don't understand is: the context dir only comes up as an argument. you can already pass local dirs as arguments. So why is this (possibly) local dir different? why is it an issue that that particular argument may have a local client id (codename for "llb local" I suppose?) and not the others

dapper hound
#

because when we pass a local directory explicitly in it's been unlazied

#

we've already loaded the entire thing into the engine already

shut condor
shut condor
#

Put together a fix, details here: https://github.com/dagger/dagger/pull/7744#issuecomment-2298374925 It's a pretty tiny diff all said and done.

As described there, there are obscure corner cases that are truly, deeply detached from anything in the real world (afaict), so I think it would be a genuine waste of our time to care right now. I spent a while internally hemming and hawing about ways of avoiding them before I realized they just don't matter to users and thus shouldn't matter to us. If I'm missing something that would mean they matter, please let me know, worth a double check from others.

@weary knoll my suggestion would be to try that diff I linked to in the comment and add on an extra fix for some path handling when it comes to deps. Let me know if I can help at all tomorrow. I'll be afk from 1pm to about 3pm but otherwise happy to chat about it.

GitHub

This is an implementation of #7647
API changes
This PR updates the Function and FunctionArg from our engine API to support defaultPathFromContext

Function

withArg(
""&...

dapper hound
#

Yeah sounds good to me 🙂

#

We should keep an eye on the new context dir tests though, to see how/if they flake (different module names sounds good to me 🎉 as a deflake attempt)

weary knoll
#

Thank!s Will try that asap

dapper hound
#

hmmm, i can think of a semi-legitimate use case for the caching thing:

  • kubernetes setup as we recommend it and use it today, with one big dagger instance shared between multiple jobs
  • the job makes some changes to a context directory, and then runs the dagger job
  • potential weirdness ensues
    but it feels pretty tricky to stumble into, agreed we shouldn't block on this at all, it's a fiddly amount of work
shut condor
# dapper hound hmmm, i *can* think of a semi-legitimate use case for the caching thing: - kuber...

Shouldn’t be an issue because it’s only a problem when multiple modules have the same source digest within the same session. Across separate sessions you can load different modules with the same source digest and diff context dirs/files and everything should work as expected.

The implication here is that practically speaking the only way to hit the corner case is when using nested execs.

#

And the only user that would be using nested execs to load the same module source with different context dirs/files in the same session is our integ tests

dapper hound
#

ahhhhh right, you're correct

#

i suppose if we allow caching calls across different sessions, then it becomes a bit more of a concern?

shut condor
#

Yes but that would be but one small part of all hell breaking loose if/when we do that cross session dagql caching

#

There would be a lot of similar problems then all over

#

(Not to say we can’t solve those and shouldn’t do that, just that this new thing doesn’t make the situation worse)

dapper hound
#

yeah cool, we're on the same page then, yup 😄 (i had half a message saying exactly that typed out and got distracted by a test run finishing)

#

thanks for helping out ❤️ i hadn't quite clocked how this was similar to secrets/sockets

weary knoll
#

@shut condor Ok so I applied your changes and made couple of tests and it seems quite inconsistent, here's an example of a trace: https://dagger.cloud/Quartz/traces/46513cfd7c242f94f7799254a41bcfbb
I often see the error:

failed to set call inputs: failed to load contextual arg "modSrcDir": failed to load contextual directory "./dagger/sub": failed to resolve context path: failed to stat source root: failed to stat path: failed to get requester session: session for "bep44r1jaiq9ci83xfzsy12gw" not found

Or

failed to set call inputs: failed to load contextual arg "file": failed to load contextual directory "/": failed to import local module src: rpc error: code = Unavailable desc = error reading from server: io: read/write on closed pipe

Is this flakes or something related to me not correctly handling paths? I'm not sure if there's an actual implemention bug on context dir or on the session side.
I'll dig more indeed, but because it's inconsisent makes me unsure about what's really happening :/

shut condor
#

Hm okay I just verified with the repro Helder posted earlier. I will pull this down again and try to repro.

Eating lunch atm and have to go to dr w/ Alice from 1-3, so might be delayed in response but will get back asap

weary knoll
#

I see this issue when running the current context dir test suite, not Helder's repro

#

I wanted to verify first if it was still working for simpler case haha

shut condor
#

Right makes sense, I will use that to see what’s going wrong

weary knoll
#

Thank you so much!

#

Checkout to my PR and dagger -m .dagger call --source=.:default test custom --pkg="./core/integration" --run="Module/TestContextDirectory"
Tests are in module_test.go, starting line 6773

shut condor
#

(Dr took waaaay longer than expected, so just getting back to this now)

shut condor
#

It was a doozy but figured out what's going on with the new approach for tracking client IDs (my original theory this morning was wrong). It breaks due to caching essentially. We were getting old clients that were from nested execs loading the same module. They aren't running anymore so they are all gone and lead to those errors.

It can actually be avoided by just giving each subtest it's own dagger client instance (which separates the sessions and avoids the dagql caching that causes cached nested clients to be used). I will use that as a fallback solution since it's what users would be doing anyways.

Attempting something more robust that would avoid this issue and probably be simpler overall, but if it doesn't pan out then I'll just push that fallback option

shut condor
limber swan
#

nice!

#

beyond a certain level of effort tracking down a bug, it becomes a certainty that you'll feel stupid once you find it. 😛

shut condor
#

@weary knoll Let me know if the fix makes sense to you. Don't want to just push stuff without you understanding, so happy to explain more if helpful.

I'm not actually sure there is an additional bug with paths either, I may have just been misunderstanding the way the context dirs paths work (forgot that if I set the dir to / then it's the .git dir and not the module source dir).

I added a new integ test for this dep case, but just in Go so far. I think Justin also mentioned some other integ tests that were worth adding too. But hopefully this is it and no more gotchas lurking 🙂

limber swan
#

forgot that if I set the dir to / then it's the .git dir and not the module source dir

Now that you've remembered, does it feel unintuitive?

shut condor
#

Paths are eternally confusing, to a surprising degree. But it actually does make intuitive sense that / is the root of the context and . is the module source dir. I can't even imagine what else we'd do.

What I forgot too was that the ignorepaths needed to be relative to the defaultPath. That's the same though in that it's just confusing because paths are always confusing; I can't think of any different approach that would be more intuitive

#

I think we just need some good docs showing the behavior in the various permutations of defaultPath+ignore values (absolute and relative)

#

Then it should click relatively quick I think

limber swan
#

What I forgot too was that the ignorepaths needed to be relative to the defaultPath

ooooh. But what happens if I have an +ignore but an explicit argument? Is the ignore applied to the argument?

weary knoll
weary knoll
#

@shut condor I ran tests, I assert that they pass just fine! Amazing