#Filters for directory arguments

1 messages Β· Page 1 of 1 (latest)

proper rain
#

I'm dealing with a large monorepo, where individual builds only ever need a subset of files/directories.
AFAIK the only way to filter a directory is to specify +ignore pragmas in the module code.
What would be a lot more practical in my case it run a script prior to invoking dagger that would compute all the filters and populate a .env file.
But for that to work I need dagger.Directory arguments to be able to accept these filters.
I created a PR (https://github.com/dagger/dagger/pull/11891), which implements exactly that.
Wonder is someone can take a look.
Thank you.

sonic pine
proper rain
#

oh boy! πŸ™‚
I'll take me a bit to figure it out, I only see the commit, not sure if there are examples on how to use it. It looks like it should work very well for handling go mod replace, or similar scenarios.

My use-case may be a bit tricky though... We have a tool that generates code based on files that can be anywhere in the big monorepo.
We don't know what it will read ahead of time, but we can ask it, however to give the answer the tool needs access to everything.
So it would be a lot easier to determine the includes before we invoke dagger.

If Workspace is a new type that can be "pre-populated" with includes, that'll work. But if it's only meant to be expanded from within a function we'll have to basically replicate the logic of our tool in dagger code.

Perhaps the two can complement each other. After all there is no reason why dagger.Directory, which natively supports include/exclude, should be unable to take them from command line.

sonic pine
proper rain
#

yes, it can dump a file with includes.
my plan was to then turn that into a string like path/to/monorepo?include=dir1,dir2 and that that to .env before I call dagger

#

my typical build would then use srcRoot *dagger.Directory, projectDir string

#

my use-case aside, I wonder if allowing filters on dir is generally an ok idea. it'll be baclwards compatible, because it uses ? as a separator, and no directory can have that in its name

sonic pine
#

We need to first see how many of the current user problems are definitively solved by dynamic filters (with proper docs & examples of course) - then if any problems are left unsolved, we will look into layering more features (with the corresponding cost in complexity)

#

@proper rain may I ask:

  • what is the feature scope of the scanning tool? Is it a lightweight tool that only scans, or is it more of a swiss-army knife or giant monolith?
  • what language is your tool written in?
  • what language is your dagger module written in. - the one that needs the filters
#

(I do agree that it would be nice to have a syntax to specify include/excludes directly in a CLI argument..)

#

Ah I think I remember, typescript on both sides? thinkspin nevermind πŸ˜›

proper rain
#

both are golang, the tool is pretty much a monolith with a bunch of subcommands, most of the code is in internal/... so wasn't meant to be a library. So technically doable, but probably not as easy as my original plan.

The reason I felt comfortable implementing this the way I did is because it looks like directory parsing already detects a special syntax for a git directory. So expanding it didn't seem like a terrible idea

#

I was actually going to followup with another proposal - pull user defaults from the current env with a predefined prefix, like DAGGER_ENV_USERDEFAULT1=.... That way I'd have a nice dagger wrapper and no intermediare files that's for another time, nm

sonic pine
proper rain
#

πŸ™‚ yes, but my understanding is that they can only be loaded from .env, if that could be augmented with environment varaibles, I wouldnt have to create an .env file and add it to .gitignore
a predefined prefix would make that safe, because it would mean you intend to share this var with dagger

sonic pine
# proper rain πŸ™‚ yes, but my understanding is that they can only be loaded from `.env`, if tha...

Ah I see. You're in luck, we're working on a part 2 to the workspace API, which is a first-class workspace configuration πŸ™‚ See #1468070450524459029

This includes a .dagger/config.toml which will centralize and simplify user-facing dagger configuration, including user defaults --> https://github.com/dagger/dagger/pull/11812

GitHub

What
Introduces workspaces (aka "modules v2") β€” a new project configuration layer that separates workspace config from module definitions, and gives modules a first-class API to i...

proper rain
#

saw that one too πŸ™‚

sonic pine
#

(will be backwards compatible with existing dagger installs)

proper rain
#

workspace cannot be a field, that's pretty clever, I was wondering how that would work

sonic pine
#

@proper rain I understand the friction of having to "flip" the control flow, from a wrapper+generator tool, to an embedded library... Even though it's the best long-term design (runtime support always trumps generator/wrapper when available), I'm looking for short-term conveniences that might help

proper rain
#

Migrating to a hermetic build tool is not trivial πŸ™‚

sonic pine
#

Could your tool expand slightly in scope, and actually do the work to be done inside the dagger function? That way your function would simply handoff to your tool (which would be dagger-aware)

#

An important detail is that dagger supports nesting out of the box. When dagger executes a tool in a container, if that tool attempts to connect to dagger, out of the box it will discover the underlying engine, and connect to that (with clean scoping). The same way a unix process can spawn a child process, and that child-process itself can make further system calls with scoped privileges

#

But depending on the specifics of your codebase, it might just be simpler to inline logic into the dagger function - at the risk of causing duplication like you said.

Just wanted to point out that there's all the plumbing needed to move all dagger logic to the tool, if desired.

proper rain
#

running it as an executable would be expensive because it would mean transfering the entire monorepo inside dagger to let it run, no?
But I control the source so anything is doable, can make it into a library.

Workspace is obviously a very comprehensive solution, and covers a bunch of scenarios.

So I would maybe just consider "directories with filters" on its own merits. One can say --directory-arg=some-dir, why not --directory-arg=some-dir?exclude=test_data

sonic pine
# proper rain running it as an executable would be expensive because it would mean transfering...

running it as an executable would be expensive because it would mean transfering the entire monorepo inside dagger to let it run, no?

No, because the tool would dynamically call the Dagger API (specifically, the workspace API) to find matching files; construct a properly filtered dagger.Directory; then do the work from there.

Thanks to the workspace API, no files would be uploaded unless specifically requested.

proper rain
#

I might end-up doing that eventually. tbh I didn't expect "LiveDir" to materialize quite so soon. Can't wait for next release.

Still wouldn't mind the flexibility to filter directories from cli.

I can't be sure but maybe the "wrapper prepares user defaults before calling dagger" approach is a lot easier to start with when you're migrating a non-hermetic build.

proper rain
#

....
just got a chance to play with Workspace for a bit.
Have a quick question.
The root of the workspace seems to be the root of the git repo. So by default FindUp will only look in root, right?
Is there a way to get the current/working subdirectory within the workspace?

sonic pine
# proper rain .... just got a chance to play with `Workspace` for a bit. Have a quick question...

we've been thinking about it.

The root of the workspace is actually marked by .dagger

The workspace is inside a root filesystem which includes the whole git repo (if there is one)

  • absolute paths are rooted in the rootfs (possibly git repo)

  • relative paths are (or will be soon) relative to the workspace root

we've talked about going further and exposing the client's workdir (which is what you're talking about I think) but not sure how the module would be supposed to use that info

proper rain
#

That's exactly what I meant. Ideally workspace would have a WorkDir string field, which would be a relative path from its root.

sonic pine
#

Some modules might ignore everything outside workdir. Others might scan the whole workspace

#

There's a risk of fragmentation

proper rain
#

Not sure I follow...
Let's say I want to configure a go build based my current dir.
I would first FindUp("go.mod", start=WorkDir), then parse go.mod for replace statements, build up a tree, and start the build from a subdirectory of that new tree

#

I should clarify probably that my monorepo is a bunch of pretty much independent projects

sonic pine
#

Now let's say you want to also support building all the artifacts in the workspace. Now the developer of that module needs to add an interface for the end user to configure that behavior. Then each module dev has to re-invent that wheel, with their own custom configuration system.

sonic pine
proper rain
#

well... not entirely. Typically we'd release a library and then use it, but if I want to work on a library and app at the same time, I might add a replace, so having the workspace as the root of the repo is very helpful

sonic pine
proper rain
sonic pine
#

@proper rain to be clear, I'm not dismissing your question on workdir. @modest hemlock had the exact same question. Let's say it's an unresolved design point. Your use case will help resolve it πŸ™‚

proper rain
#

πŸ™‚ I feel like I'm missing something... I do want to understand the counterargument.
if a module is meant to work on the entire workspace, function names would make that clear, won't they? build-all or whatever.

sonic pine
proper rain
#

FWIW there’s prior art to this, gradle has a settings.gradle, which is the equivalent of a workspace root, and any number of build.gradle under that root. All tasks are contextual by default, unless you’re specifically say something like β€œ:build”.

sonic pine
#

There are countless instances of prior art πŸ™‚ Basically every build and scripting system has a variation of this problem.

Do you have a positive experience of Gradle's approach?

proper rain
#

Yes, I find it intuitive. No build system is perfect, but this part never confused me.

#

I think the general idea of a workspace containing other projects is pretty established. And usually (that may be subjective) people work on a project and then that project can pull other projects from the same workspace as needed.

#

btw today I tried using a dev engine for the first time, didn't know what to expect, built a dagger client and it somehow managed to initialize a new engine on the fly. Some black magic.

sonic pine
proper rain
#

I used that before for testing, but that gives me a container to play in.
today I did dagger -v call cli binary --platform darwin/arm64 export --path ~/.local/bin/dagger-dev, and then ran dagger-dev and it just created the image somehow

sonic pine
proper rain
#

oh, that's a lot less woodoo. I thought it figured out a way to build the engine from source

sonic pine