#"dev module" pattern

1 messages ยท Page 1 of 1 (latest)

thick dust
#

๐Ÿงต

#

I don't want to go back to a ci source directory though

#

If we were to revert the creation of github.com/dagger/dagger/dev, that would leave some unresolved problems:

  1. What do we do with sdk/python/dev? That's a different pattern. Are we willing to remove that module too? And if not, why not? How do we achieve consistency within the repo?

  2. What do we call the source repo? I don't want to go back to ci/. Can we live with .dagger?

  3. What source directory should dagger init default to? The promise of the dev module pattern, is that we could always default to . (since you would explicitly dagger init ./dev). But in honesty maybe that was only moving the problem, since we would still need a way to indicate that ./dev has a special relationship to . (to get rid of -m)

scenic wing
#
  1. Doesn't have to be a different pattern in relation to the name of the module and how it's called, but what I'm hoping for has always been this:
  • /sdk/python/dev includes most of what's in /dev for Python and then some (e.g., updating dependencies, type checking, explicit sub testing, preview docs build, etc..). These should be easily available when developing on /sdk/python in your terminal.
  • Have /dev depend on /sdk/python/dev to implement the SDK interface that's going to be called in CI, but rather than implement those features, it's much simpler because it's making dag.PythonSdkDev() calls. Same benefits from a module having dag.Self()!
  • For other SDKs to follow the same pattern. Implementations in their language (when it's useful), coordinated in the root's /dev.
thick dust
#

Have /dev depend on /sdk/python/dev to implement the SDK interface that's going to be called in CI, but rather than implement those features, it's much simpler because it's making dag.PythonSdkDev() calls. Same benefits from a module having dag.Self()!

Yeah 100%, my initial PR actually went straight to that, but then I gradually scaled back my ambitions with smaller and smaller PRs ๐Ÿ˜› What you're describing is still the ultimate goal in my mind

#

But, is it /dev importing /sdk/python/dev, or / importing /sdk/python? That's the fork in the road here I think

#

(pattern 1: /dev submodule, pattern 2: / module + /.dagger or / source dir depending on the situation)

#

In other words, either way we need and want support for submodules in a monorepo. But for each module and submodule, what is the pattern for organizing them

scenic wing
# thick dust But, is it `/dev` importing `/sdk/python/dev`, or `/` importing `/sdk/python`? T...

Yeah, that sgtm (i.e, / importing /sdk/python), I actually moved to that in python's dev module (locally, with sdk/python/dagger.json -> dev) a few months back.

There's real benefits on using .dagger in terms of pattern exclusion because I've regretted putting under sdk what we codegen in Py and TS, but I'd be able to exclude **/.dagger/sdk easily. Why did you move dagger.json to dev? Was it just to align with sdk/python/dev or did you find real issues with having /dagger.json?

So I'm ok putting /sdk/python/dagger.json and having "source" point to dev or even .dagger.

#

However... we found an issue with two modules wanting "ownership" of sdk/<name>. For example both Elixir and PHP have modules support now, but they're not bundled so you need to use a path on --sdk. However, they had no simple way to get the SDK's sources in that module so the easiest solution was to move the runtime module's dagger.json to sdk/php/dagger.json and point "source": "runtime". It makes a lot of sense to say:

dagger init --sdk=github.com/dagger/dagger/sdk/php

Instead of:

dagger init --sdk=github.com/dagger/dagger/sdk/php/runtime

With context directory we'll be able to get the SDK's sources without moving dagger.json to the parent, but that'll bring back the need for runtime on something the users of these SDKs will need to use a lot. We already have a convention for putting that module under a runtime subdir, so we could have the engine just assume that and expect a directory with a runtime included. That gives you best of both worlds (once you have context directory):

  • Keep runtime/dagger.json but make it a requirement when using --sdk= to reference a parent with that subdir. Even if it's a repo with only that.
  • Move dagger.json to the parent for dev so you don't need -m ... while developing.
thick dust
# scenic wing Yeah, that sgtm (i.e, `/` importing `/sdk/python`), I actually moved to that in ...

Link to my reasoning at the time: https://github.com/dagger/dagger/pull/7766#issuecomment-2195022396

TLDR was:

  1. consistency with sdk/python/dev
  2. the fact that if your project is itself a module, it seemed better (because otherwise you have to mash 2 modules together in the same dagger.json: the actual project, and the module to develop the project). However for this point, there is a proposed alternative, which is to make it easier to "mash these two modules together" - with a big focus on a module being able to test itself. So that might change the equation
#

Underpinning all this is uncertainty on what is "the right way" to manage the relationship between upstream software and dagger modules.

Between 1) a module to develop X, and 2) X as a module, what's the difference? Do we support both, or is one better than the other?

#

But yeah, with context directories some of the constraints will be removed - maybe once that domino falls, the rest will follow

scenic wing
#

I'm curious to try /sdk/python/dagger.json with "source": "." (for dev) but relying on "include"/"exclude" to get the right files. That's what was done in PHP I think. In this sense the module and project are the same. And it actually makes sense in this case. That can work today already.

Context directory does make it better though because you can have different functions needing a different slice of the project. I've certainly created several views in Python when that feature came out (for tests, lint, docs, etc), but it was cumbersome with views, so I reduced to just two (lint and default). It also allows you to test the function against another version of the files like in a git remote or PR, but if you still keep dagger.json on the project dir and depend on include/exclude, you may be loading more than you need every time. Unless you move "source": ".dagger" which has best of both worlds I guess.

thick dust
scenic wing
#

Best if you can depend more on includes rather than excludes. That was feedback I gave to the PHP folks because they basically relied on excludes but they had lots of lines in there that was very easy to miss new additions to the SDK. So I agree that relying on include/exclude is harder to get right.

thick dust
#

I will float an old proposal, which was to allocate a source directory per sdk, for clear ownership separation of concerns

#

.dagger/sdk/<NAME> -> reserved for the use of that SDK for this module

#

then you never have to worry about wack a mole, or mixing generated and non-generated files, what to ignore etc

#

or for a shorter path: .dagger/<SDK NAME>

scenic wing
thick dust
#

no this would be specific to source directory

#

(so within one module as opposed to relationship between modules)

scenic wing
#

What would you put in .dagger/<SDK NAME>? Can you give an example? Seems like just a rename of ./sdk in non-Go SDKs.

#

Btw, Go works differently here. For Go to work like the others, you'd put the SDK under a subdir and in your module's go.mod have a replace rule (or workspace) to install from local dir instead of "published" package. Then the SDK's dependencies can be isolated to the SDK, instead of being mixed in the user's go.mod.

thick dust
#

@scenic wing resuming this... Not a strong opinion, more of a thought experiment. I was talking about .dagger/python being the default source dir for dagger develop --sdk=python; .dagger/go being the default source dir for dagger develop --sdk=go; etc

analog egret
#

i seem to remember we considered this one before

thick dust
#

But I'm wondering if it would be viable to do it the other way around... @scenic wing do you think the SDKs could handle conflict with an existing codebase (pipeline embedded in an app) by "sneaking" the pipeline source in a native-friendly place in the app source code, so that it doesn't clash with the native tooling?

For example in Go, it might be internal/dagger.io/module or internal/dagger.io/self or some other reserved / non-clashing place?

thick dust
thick dust
scenic wing
#

Depends on the language but the way SDKs are structured, modules are a sub-package, with another sub-package inside (except Go which mixes the SDK with the user's code).

thick dust
#
  1. What replaces dev modules
  2. Where does ci/ go
  3. What does dagger init --source default to
analog egret
#

i think the question of what the default should be and what we do are very different - i don't really mind where we put ours, but i think the most obvious answer to dagger init should to always be the current directory

thick dust
analog egret
#

dev/, ci/ can be anywhere, but for our purposes i would rather it not be dagger//.dagger (even if other do this)
that means our module would be called github.com/dagger/dagger/dagger which is not really great

scenic wing
thick dust
thick dust
analog egret
#

i think my biggest objection to this is that there's now layers of empty directories, it starts to feel a lot more java-like ๐Ÿค”

thick dust
#

Well if it's internal/dagger the server code might clash with the generated client code

analog egret
#

i know some tools collapse them down nicely, but i often get very frustrated at tools that have lots of directory nesting

#

also internal has special significance to go code - our repo uses it for example

scenic wing
#

Why not do what other sdks do and put the sdk on a subdir of the module and have a replace rule in go.mod to install the sdk from that subdir?

analog egret
#

it's much more likely to clash

#

so different directories for different sdks?

#

that feels confusing to navigate

thick dust
#

@analog egret let me know when you're at the second iteration of thinking about it ๐Ÿ˜›

#

The knot:

What does dagger init --source default to

It depends whether the module is "standalone" or "contextual".

  • Standalone: could be anything. The shorter the better (SDK has the module root to itself)
  • Contextual: no good answer. All options seem flawed in some way. I'm looking for more.
analog egret
#

how do you mean? the problem is, personally, unless the default is switched to the current directory, i will just keep using --source=.. it's genuinely difficult for me to weigh up whether i would prefer internal/ci/dev/dagger/long/path/here/etc, because most of the time I will always want --source=.

#

it still doesn't matter for contextual? the root is still the same place, right? we're just deciding about where the source code is inside the root

thick dust
#

@analog egret I also always type --source=. but you and I are not representative of every use case, we also need a great default for "daggerize your app repo", and --source . is not it

thick dust
analog egret
#

i don't understand why we need to use the default

#

i think we should use dev/ or similar. i think the default should be ..

thick dust
#

What source directory should dagger init default to? The promise of the dev module pattern, is that we could always default to . (since you would explicitly dagger init ./dev). But in honesty maybe that was only moving the problem, since we would still need a way to indicate that ./dev has a special relationship to . (to get rid of -m)

๐Ÿ‘†

analog egret
#

the special relationship today is indicated through the presence of the .git dir/or a dagger.json in the root - but moving dagger.json into dev changed that entirely

thick dust
#

i don't understand why we need to use the default

We don't need to use the default, it's just a convenient way to make sure the default is good. If we don't use the default, we still need to find the right default for everyone else, but without the benefit of dogfooding to be sure we designed it right

analog egret
#

right sorry, i was going to add some more ๐Ÿ˜› pressed enter too soon

thick dust
#

We need to use @scenic wing 's mindmapping tool to keep up with the decision tree

scenic wing
#

I think the default should be for source to default to root directory. When you create a new module like this:

$ dagger init --sdk=go foobar

That would create a new directory foobar with dagger.json next to main.go.
But if you try to use an existing directory with files:

$ dagger init --sdk=go
The X directory is not clean and some files could be overwritten. You can specify  `--source` to blah blah...
Do you want to continue? [Y/n]

If --source is specified explicitly then no need for prompt. Also, introduce a -y flag to skip the prompt.

thick dust
#

Decision one: root module (1) or dev module (2)

- if Root module:
    - Where does module source code go?
    - How to co-exist with an app?
    - What if the app is a Dagger module?

- if dev module:
   - How to auto-detect the dev module?
thick dust
scenic wing
#

Root module, put sources in dev, don't be a part of an app, be a sub-package. Putting sources in a subdir fixes that. Same thing if app is a dagger module.

analog egret
# analog egret right sorry, i was going to add some more ๐Ÿ˜› pressed enter too soon

my pov summarized:

  • the default source should be . - but you can configure this with --source=<foo>
    • if you try and set --source to a non-empty directory, dagger tells you to not do that, and if it was ., can even suggest you use dev/ci. this would require we fix the go.mod/go.sum thing so that users don't get tripped up by that (but we should do this anyways)
    • why .? relatively standard for most tools to do this. ls/tree/etc tools use the current directory. go mod init/hugo init/npm init all do it. i understand we're a bit special, but not doing the default is going against the grain, and is a fight uphill (that we've already seen from the original proposal to change this)
  • dagger.json should always be the "root" (or the "context" from contextual modules). no .git dir shenanigans (this is a hard sell i guess, but i find this behavior trips me up a lot, but probably this is controversial)
  • we consolidate terminology - the "context" is the same as the "root".
  • we should avoid altering the behavior of commands based on what's currently around - e.g. we shouldn't have behavior that puts --source=dev if . already has an app
thick dust
#

no .git dir shenanigans

Do you mean actual use of the .git directory? I'm not aware of a proposal to do that

analog egret
#

i mean more in our implementation of determining the root - if no dagger.json is found, we determine the root as where we find the .git directory

thick dust
#

There is no planned change to finding the module root though. That's already settled (it's where dagger.json lives, always)

scenic wing
thick dust
analog egret
scenic wing
thick dust
analog egret
scenic wing
thick dust
#

So just to focus on the hardest part (at least for me): when 1) the SDK is initialized (whether via dagger develop --sdk or dagger init --sdk, and 2) the module root is not empty (eg. the module is embedded in an existing project) --> where should the module's source directory point to by default?

It's important to me that there is a default, because it's a very common situation and it's too much work to ask users to choose each time

#

In that situation the possible defaults could be (all paths relative to the module root)

  • . (most requested default, ideal for standalone modules, messy for embedded)
  • .dagger
  • dagger (current default in all cases)
  • .dagger/<sdk>
  • SDK-specific (based on the contents of repo), for example Go SDK might choose internal/dagger_module
scenic wing
#

We already have a default for dagger now and a lot of users would prefer for it to be "." (i..e, next to dagger.json).

thick dust
scenic wing
#

Yeah, but today you have good for embedded but bad for standalone.

scenic wing
#

And none of those options have a sweet spot for both.

analog egret
#

i think we should avoid anything that's based on the contents of the repo - this feels like it will be confusing, having the very first command that most dagger users will run do "magic" that might not always work isn't great

thick dust
#

Guys at this point it's useless to point out only the downsides of this or that option. They all have downsides. We have to talk in terms of tradeoff.

scenic wing
#

Well, one important data point is what's the most common use case going to be? We made a bet on the ./dagger default to favor embedded, but wasn't most of the feedback that users prefer the default to be .? Or we actually don't know if that's what most users feel (since most don't voice their opinion)?

analog egret
#

dagger init should be in cloud actually. could we see what args people tend to run it with? if we want more concrete data there

scenic wing
#

Can you see the values of "source" easily?

thick dust
#

It makes sense that we get the most complaints from standalone module devs, because they're the most active on our discord, and once you're a power user you will primarily be creating standalone modules.

But, there's selection bias. If the default is confusing, or generally adds friction, for embedded repos, that hurts onboarding - and that impact is silently losing people who could have become power users, but instead just gave up

#

I think it makes sense to change the default to . now, as a stopgap while we continue figuring out how to make the onboarding great

#

In theory that will make it harder for me to get traction around improving the default further - but I'm already struggling on that front ๐Ÿ˜› so won't make a difference either way

#

SDK-specific (based on the contents of repo), for example Go SDK might choose internal/dagger_module

Specifically this ๐Ÿ‘† has promise I think

#

Let the SDK decide how to co-exist with an existing project

#

We kind of do that already with the "re-use existing go.mod" behavior, but currently it hurts more than it hurts. Maybe we can improve that capability to make it part of the solution rather than a problem

scenic wing
#

I still don't know why it's better to mix the pipelines in an existing source instead of making it a sub-package. Same could be done in Python for example, but I prefer having in a package separate from the app.

thick dust
#

Sub-package is good. The problem is what to call the sub-package right?

scenic wing
#

Not completely. The go.mod dependencies are mixed and the import changes from module to module. It could be import "dagger.io/dagger", same as non-modules.

scenic wing
#

I've said this multiple times but never got a straight answer. Basically doing what the other SDKs are doing. From the module's sources you'd use the SDK the same as you'd go get dagger.io/dagger but instead of installing from the web, the module's go.mod would have a replace rule to use the one from the subdir, or use a workspace (like I've been doing with uv).

analog egret
#

we could still put the code wherever though right?

scenic wing
#

I like the idea of SDKs standardizing on a subdir name for this. One that's easy to add to exclude patterns without knowing their parent dirs.

thick dust
#

Let's get specific because I am not understanding clearly what you are proposing.

Scenario: "let's daggerize my Go app!" (the app is dagger/dagger in this case)

$ git clone https://github.com/dagger/dagger
$ cd dagger
$ ls dagger.json
dagger.json: no such file or directory
$ ls go.mod go.sum
go.mod
go.sum
dagger init --sdk=go

What files have just been created?

scenic wing
#

We're talking about two different things at the same time. I was replying to something you said. To be clear, this suggestion is about where the SDK's files go, inside the source dir. Not where the source dir goes relative to the root dir.

thick dust
#

(but understood)

analog egret
#

A potential other idea to add to the list - if we want to have more opinionated defaults, could we have "templates"? If you want to daggerize your existing go app, dagger init --template=<name-of-a-go-template, maybe even pulling from a remote git repo

#

We could have the generalized init that way, but still a way of doing more specialized migrations - here's an easy way to daggerize your Django app, here's one for your simple go cli, here's a cargo wrapper, etc

#

To me that feels better than trying to anticipate all generalized use cases, since we could specialize for each of those

scenic wing
#

It's because I see things separate in my mind and mixing things make it more confusing. A Dagger Module is an app in itself, in the language of the chosen SDK. The SDK library should be generated separately from the Module's own source code, as part of a workspace (in its own subdir).

thick dust
#

A Dagger Module is an app in itself, in the language of the chosen SDK

That definitely makes sense, and has been my assumption also. I think in the case of an embedded module, I'm becoming more flexible on that point, and willing to explore alternatives if it solves the user problems

#

In a perfect world, we would choose source directories that let us have it both ways: works if the module is a standalone app; and also works if it's mixed with an existing app. That's why internal/dagger-module (sub-package + a collision-safe name) is attractive to me in Go. It works in both cases.

scenic wing
#

But isn't the advantage of "embedding" just to get access to the "context directory"? What's the use cases for having it be a part of the same existing app in the repo?

scenic wing
#

That confused me before but you're basically saying /internal/dev/internal/dagger, right?

thick dust
scenic wing
#

If it's /internal/<module> and the module has internal/dagger that's what it would be.

thick dust
#

But yeah you're right that maybe source would be ., and the SDK would just change where it puts the files in that source dir

#

No I don't mean <module>

#

I mean the literal "module"

#

github.com/dagger/dagger/internal/dagger/self (hesitating between self and module to convey what I mean)

scenic wing
#

So an harcoded internal/dagger_module, same as the other proposal for the hardcoded .dagger?

thick dust
#

So my amended proposal:

  • Always default source to .
  • Go SDK changes from <source>/main.go to <source>/internal/dagger/main/main.go (just realized main/ might be a good candidate)
  • Perhaps other SDKs don't need to change anything, because source=. is already fine even if there's an existing python or typescript app there?
scenic wing
#

Other SDKs would conflict too.

thick dust
#

Is there an equivalent trick, where the module's "main" code could live in a sub-package named in a way that avoids any conflicts?

#

And ideally, always use that sub-package location regardless of what's in the source dir? So no more "if the dir is non-clean do X, otherwise do Y". would be more robust and address @analog egret 's concern

scenic wing
#

The way I like to solve this is to consider the module it's own "app". Installed as a dependency instead of being vendored on the existing app. Not sure why people would prefer it being mixed. Additionally, Python is a very old language and people have many different ways to structure their applications. The most compatible way to work with that is as a dependency.

thick dust
scenic wing
#

Can't have an existing app there if it's ..

#

It wouldn't overwrite existing files but the module wouldn't work.

thick dust
#

So as a user what do I do? Manually use --source to the path of a new "app"?

scenic wing
#

I'm trying to think about how we can use the [target] argument here.

scenic wing
# thick dust So as a user what do I do? Manually use `--source` to the path of a new "app"?

Yep. But is it too confusing to have --source default to [target]?

dagger init --sdk=python foobar

This would put dagger.json with "source": "." in foobar.

If you use:

dagger init --sdk=python --source=foobar

Then dagger.json in current dir, with "source": "foobar".

If you do:

dagger init --sdk=python

Then dagger.json in current dir, with "source": "dagger"

So default for --source still dagger, unless [target] is used, in which case it defaults to that value.
It works for me, my concern is that it's confusing for others.

thick dust
#

yeah I find it confusing

#

(actually didnt get it on first read)

scenic wing
#

Yeah, so that's what I'm trying to conciliate just need to focus for a bit. Some variation of that, maybe the --source is complicating things so I want to see if there's a good simplification here.

thick dust
#

If the repo has an existing Python app, is there nothing in the layout of that app that we know for sure will be true? For example where subpackages are stored

#

If so, the solution might be to always make the module's "main" code a subpackage. Since it's not actually a real main app anyway.

#

If we adopted the rule that "your module's code is a library" it would solve a lot of things

scenic wing
#

Basic idea is to default to putting the "module" in a subdir, but if you specify a target directory (non-existent), then assume standalone.

thick dust
#

Or leave source to be always ., and move the boilerplate from src/main/__init__.py to src/dagger/main/__init__py, and the sdk from sdk/ to src/dagger/sdk/ so that it always works at the same paths

scenic wing
#

Still don't like vendoring the module's source in the repo's app like that. You also need to conciliate pyproject.toml, etc..

thick dust
#

I take note of the fact that you don't like it. Indulge me in the thought experiment? What would it mean to conciliate pyproject.toml?

scenic wing
#

Not feasible. There's a lot of fragmentation in the ecosystem, multiple package managers, with different lock file formats, and ways to structure a project. It would take a lot from us to attempt it. Not to mention that it's just too invasive. The only sensible way this can work is to require the user to add some config in their pyproject.toml instead of doing it automatically.

thick dust
#

OK but what kind of change are we talking about? Why is pyproject.toml involved at all

scenic wing
#

It's the equivalent of go.mod and package.json. It describes where the sources are and what the dependencies are, etc...