#Last mile review

1 messages ยท Page 1 of 1 (latest)

pseudo atlas
#
  • Container.Shell() -> Container.Terminal(). I know it's longer to type, but I think it's better to have a clean model where the words are used consistently. Otherwise Container.WithDefaultShell().Shell() will be confusing.
#
  • dag.CurrentModule() is just long... I don't have a great alternative in mind, but feel like it's at least worth thinking of one. We've been pretty good so far at keeping our API from looking like the Windows API... It's hard because it never happens all at once, it just creeps up on you, one longer-than-necessary name at a time. Granted, this is really a nit.
#
  • dagger update: I'll address that one in the PR @thick burrow , but it feels like it has to at least update something (I'm thinking the dependencies). If we don't have time to implement, better to return an error "not yet implement" than change the meaning to something else IMO. Will comment in the PR
#
  • :"root-for": {"source": "."}. I really think that default value should be applied by the config parser, rather than explicitly writing it out on each dagger.json. In other words, dagger init should not write that field by default, and a dagger.json without that field should be valid. I understand the argument that "the complexity is hidden away, you're not supposed to edit the file directly", but it's still valuable to avoid unnecessary complexity. People will read the file, and reason about how it works and what needs to be in it. I should be able to write a valid dagger.json from memory without having to memorize arcane fields I don't actually need
thick burrow
#

I understand the argument that "the complexity is hidden away, you're not supposed to edit the file directly", but it's still valuable to avoid unnecessary complexity. People will read the file, and reason about how it works and what needs to be in it.
We will be adding complexity if we don't do this because the engine needs to do extra work (more find-ups) and in general have more complex internal logic around handling all the permutations of cases (what's a valid config, what's not, etc.). Right now it's very simple in that a valid config needs to have a root-for somewhere, and for the simple case of root dir == source dir, that means root-for is .. If there isn't a root-for set in any find-up, it's invalid.

We can eat that if it's worth it, it's not a dealbreaker level of complexity, but:

  1. It's easy to explain why it exists (there must be a root-for for any valid module)
  2. I really don't want to encourage users to be writing these files themselves. Limiting the config format to something easily writable from memory by a human is a huge constraint, and I just don't know why we'd take that on when we can just say "use the CLI to modify dagger.json, if you write it yourself you are on your own"
#

(Not ignoring your other comments, just need to update/review some PRs first ๐Ÿ™‚ be back here soon)

pseudo atlas
#

Mmmm I just realized there is no source field

sudden heart
#

May I add that since include and exclude paths are relative to the root, it makes more sense to me to keep them next to in root-for.

pseudo atlas
sudden heart
#

It would have to be inside. To be module specific.

pseudo atlas
#

Right now it's very simple in that a valid config needs to have a root-for somewhere, and for the simple case of root dir == source dir, that means root-for is .. If there isn't a root-for set in any find-up, it's invalid.

I don't understand this. If I'm loading a module and its config is {}, I know it is not the root for any sub-modules - only for it self. I can continue the find-up to the next subdirectory. I don't understand how that is not the simplest implementation. It makes me think I'm missing an important part of how find-up actually works.

thick burrow
pseudo atlas
#

But there should be a Source field in that struct, right?

thick burrow
# pseudo atlas > Right now it's very simple in that a valid config needs to have a root-for som...

Example: the module is in a git repo at subdir foo/bar/mod/. Currently, if that module doesn't have a separate root, the engine sees foo/bar/mod/dagger.json has root-for entry of . and knows it can stop looking for the root. If we remove that, the engine has to always keep doing a find-up until it sees that there is no root-for in any dagger.json in any parent dir. Right now the cost is low, but I could see that becoming worse once we want to optimize loading of a large git repos. That's besides the other more complex logic it would entail.

thick burrow
pseudo atlas
#

I think I may have severely misunderstood how the current implementation works

#

But I think what I'm explaining (in that other thread) matches your last comments in the issue

thick burrow
#

Ah sorry, I for some reason left in "source": "." in all of those here, that's my bad: https://github.com/dagger/dagger/issues/6291#issuecomment-1911337000

But yeah all those values are just . because there is always a dagger.json in the source dir of a module (why it's not needed anymore). I'll update that comment for posterity, but it's literally just deleting "source": ".", everything else in the description should be unchanged

GitHub

Currently, by default the engine loads the directory tree at a module's dagger.json and nothing else. However, to support cases where there are files/dirs the module relies on that are outside ...

#

Updated that comment

pseudo atlas
#

OK so at some point along the way we misunderstood each other

#

Can we use my answer to Nipuna as a test case? What should I have answered?

#

I was so happy with my answer too ๐Ÿ˜›

thick burrow
# pseudo atlas Can we use my answer to Nipuna as a test case? What *should* I have answered?

It would be:

That way this works:

$ git clone YOUR_REPO repo
$ cd repo
$ dagger call -m ci test --dir .

Your Dagger config would look like this (at the root of the repo):

$ cat dagger.json
{
 "root-for": [
    {
      source: "./ci"
    }
 ]
}

And there would be a dagger.json in the ci/ dir colocated with your module's source code that looks like this:

$ cat dagger.json
{
 "name": "foo",
 "sdk": "go"
}
pseudo atlas
#

What's the point of root-for in this case? He doesn't need it (assuming the ci module doesn't depend on other modules in the monorepo)

#

And I don't get to drop the -m ci

#

(which was the point of the source field)

thick burrow
pseudo atlas
#

In my mind the source field is pretty much orthogonal to the rest. Just a small optimization so that a module can tell its SDK where to load source code (instead of the default of the module's root)

thick burrow
pseudo atlas
#

Yes totally

#

In this case I just want the root of my repo to be the module

#

So the baseline is: dagger init at the root of my repo; dagger develop at the root of my repo; drop a bunch of go files at the root of my repo. Conceptually simple, works fine, but pollutes my repo root.

So, add source: to tidy things up, but no fundamental change: my repo root is the module

thick burrow
pseudo atlas
#

I think we're just using "source dir" in different ways. There's always a dagger.json in the module's directory - by definition, no dagger.json means no module. But not all modules have source code

thick burrow
thick burrow
pseudo atlas
#

I think the hang up is on whether you can decouple a module's source code from its root directory.

#

(I'm now thinking context-for might have been less confusing, since now I'm using "root" for two different thigns)

thick burrow
#

Okay, I think I'm seeing what you were imagining. What I implemented had essentially two points:

  1. There's a root dir, which is only used by sdks for building. It's where the root-for entry for a module is located
  2. There's a source dir, which is where the module source code is. Must either be the same as the root dir (simple case) or a subdir of it (case where go.mod in parent dir, etc.)

But what you're describing sounds like there's 3 points (correct if wrong):

  1. A root dir, where the root-for entry is
  2. A "source root dir" (? not sure what to call it? context?) which is where the dagger.json with the module's name, sdk, etc. config is
  3. A "source subdir" which is where the actual source code is located
#

If so, the current implementation basically just collapses your point 2+3 to always be the same.

pseudo atlas
#

Yes that's right. Your number 2 I just called "the module directory" but now realize it would have helped to reserve a name for it.

Right now this terminology feels clearer (keeping same numbers as you):

  1. "load context" dir, available to SDKs for loading, and security boundary for accessing ..

  2. Root directory: always where dagger.json is.

  3. Source directory: where the module's source code lives (defaults to the root)

thick burrow
#

One thing is that two modules can't share a root directory (2. in your most recent message), unless we go back to having a list of modules: [ ... ] again.

#

There are probably cases in which that might be annoying, but as far as I can think there would always be alternative options for how to layout directories that could achieve the same end effect.

#

The main thing that becomes possible now is the use case of "clone repo + dagger call (no -m)"

#

And will need to handle corner cases like:

  1. module A having "root directory" set to ./foo and "source directory" ./foo/bar/baz
  2. module B having "root directory" set to ./foo/bar and "source directory" also set to ./foo/bar

Layouts like that would just have to be invalid. Which is fine, but needs to be detected in such a way that it doesn't error out on slightly different but valid cases like:

  1. module A having "root directory" set to ./foo and "source directory" ./foo/bar/baz
  2. module B having "root directory" set to ./foo/otherbar and "source directory" also set to ./foo/otherbar
#

(I'm just stream of consciousness dumping right now as I think of things)

#

Also need to deal with what @sudden heart mentioned before about where include/exclude is set. Obviously it being set in the "root directory" dagger.json makes sense, but it still would make sense for it to be a setting in the "load context directory" context-for entry (presuming it's renamed to that). Just in that maybe a go.mod is in the root of the repo (and maybe it refers to some other go packages that are relative to it), but that doesn't mean we should always load the whole root repo in its entirety...

pseudo atlas
pseudo atlas
#

btw I can talk live if that's easier

#

sorry if I'm making your life difficult today ๐Ÿ˜…

#

worth sweating the details on this

thick burrow
# pseudo atlas I didn't get this part. What is wrong with that example layout?

Started typing out why and realized that we could make it work ๐Ÿ™‚ Thought process:

I might invoke module A via e.g. dagger call -m github.com/org/daggerverse/foo/bar/baz. The engine loads the git repo, starts a find-up at foo/bar/baz until it finds the dagger.json (and also finds the context-for in possibly another parent dagger.json).

The thing we need to be careful about is that it will actually hit module B's dagger.json in the find-up first. I was thinking that would just make this invalid. But actually we can find the "right" dagger.json by checking the source path.

It's the same logic as already exists for finding the "right" dagger.json that contains the root-for a given module, it's just now we need to do that on two levels: what dagger.json contains the source for a given subpath and what dagger.json contains the root-for for a given subpath.

#

We're gonna need a lot integ tests ๐Ÿ˜„ But that's obviously fine, it's manageable I think. And I do like the flexibility

thick burrow
pseudo atlas
#

you're welcome, future selves

#

does this mean that context-for becomes just an array of strings?

#

eg "context-for": ["./foo"]

thick burrow
pseudo atlas
#

yeah I get that. We should use a field other than source though, to avoid confusion

#

path?

thick burrow
pseudo atlas
#

My CUE brain immediately thought: let's just make it a string now, and make it string|{...} later ๐Ÿ˜›