I'm dealing with a large monorepo, where individual builds only ever need a subset of files/directories.
AFAIK the only way to filter a directory is to specify +ignore pragmas in the module code.
What would be a lot more practical in my case it run a script prior to invoking dagger that would compute all the filters and populate a .env file.
But for that to work I need dagger.Directory arguments to be able to accept these filters.
I created a PR (https://github.com/dagger/dagger/pull/11891), which implements exactly that.
Wonder is someone can take a look.
Thank you.
#Filters for directory arguments
1 messages Β· Page 1 of 1 (latest)
we've got something better for you to try π A new workspace API that allows you to dynamically request files from your workspace, passing filters as regular arguments to regular function calls.
(basically we shipped https://github.com/dagger/dagger/issues/11287)
oh boy! π
I'll take me a bit to figure it out, I only see the commit, not sure if there are examples on how to use it. It looks like it should work very well for handling go mod replace, or similar scenarios.
My use-case may be a bit tricky though... We have a tool that generates code based on files that can be anywhere in the big monorepo.
We don't know what it will read ahead of time, but we can ask it, however to give the answer the tool needs access to everything.
So it would be a lot easier to determine the includes before we invoke dagger.
If Workspace is a new type that can be "pre-populated" with includes, that'll work. But if it's only meant to be expanded from within a function we'll have to basically replicate the logic of our tool in dagger code.
Perhaps the two can complement each other. After all there is no reason why dagger.Directory, which natively supports include/exclude, should be unable to take them from command line.
if you already have a tool, and you were prepared to generate dagger filters from it, then your tool must have an intermediate representation of its scan result?
yes, it can dump a file with includes.
my plan was to then turn that into a string like path/to/monorepo?include=dir1,dir2 and that that to .env before I call dagger
my typical build would then use srcRoot *dagger.Directory, projectDir string
my use-case aside, I wonder if allowing filters on dir is generally an ok idea. it'll be baclwards compatible, because it uses ? as a separator, and no directory can have that in its name
not a priority given the current focus on dynamic filters which are strictly more powerful
We need to first see how many of the current user problems are definitively solved by dynamic filters (with proper docs & examples of course) - then if any problems are left unsolved, we will look into layering more features (with the corresponding cost in complexity)
@proper rain may I ask:
- what is the feature scope of the scanning tool? Is it a lightweight tool that only scans, or is it more of a swiss-army knife or giant monolith?
- what language is your tool written in?
- what language is your dagger module written in. - the one that needs the filters
(I do agree that it would be nice to have a syntax to specify include/excludes directly in a CLI argument..)
Ah I think I remember, typescript on both sides? nevermind π
both are golang, the tool is pretty much a monolith with a bunch of subcommands, most of the code is in internal/... so wasn't meant to be a library. So technically doable, but probably not as easy as my original plan.
The reason I felt comfortable implementing this the way I did is because it looks like directory parsing already detects a special syntax for a git directory. So expanding it didn't seem like a terrible idea
I was actually going to followup with another proposal - pull user defaults from the current env with a predefined prefix, like that's for another time, nmDAGGER_ENV_USERDEFAULT1=.... That way I'd have a nice dagger wrapper and no intermediare files
Isn't that the user defaults feature that already exists? (Sorry can't resist answering even though you crossed it out π
π yes, but my understanding is that they can only be loaded from .env, if that could be augmented with environment varaibles, I wouldnt have to create an .env file and add it to .gitignore
a predefined prefix would make that safe, because it would mean you intend to share this var with dagger
Ah I see. You're in luck, we're working on a part 2 to the workspace API, which is a first-class workspace configuration π See #1468070450524459029
This includes a .dagger/config.toml which will centralize and simplify user-facing dagger configuration, including user defaults --> https://github.com/dagger/dagger/pull/11812
saw that one too π
(will be backwards compatible with existing dagger installs)
workspace cannot be a field, that's pretty clever, I was wondering how that would work
@proper rain I understand the friction of having to "flip" the control flow, from a wrapper+generator tool, to an embedded library... Even though it's the best long-term design (runtime support always trumps generator/wrapper when available), I'm looking for short-term conveniences that might help
Migrating to a hermetic build tool is not trivial π
Could your tool expand slightly in scope, and actually do the work to be done inside the dagger function? That way your function would simply handoff to your tool (which would be dagger-aware)
An important detail is that dagger supports nesting out of the box. When dagger executes a tool in a container, if that tool attempts to connect to dagger, out of the box it will discover the underlying engine, and connect to that (with clean scoping). The same way a unix process can spawn a child process, and that child-process itself can make further system calls with scoped privileges
But depending on the specifics of your codebase, it might just be simpler to inline logic into the dagger function - at the risk of causing duplication like you said.
Just wanted to point out that there's all the plumbing needed to move all dagger logic to the tool, if desired.
running it as an executable would be expensive because it would mean transfering the entire monorepo inside dagger to let it run, no?
But I control the source so anything is doable, can make it into a library.
Workspace is obviously a very comprehensive solution, and covers a bunch of scenarios.
So I would maybe just consider "directories with filters" on its own merits. One can say --directory-arg=some-dir, why not --directory-arg=some-dir?exclude=test_data
running it as an executable would be expensive because it would mean transfering the entire monorepo inside dagger to let it run, no?
No, because the tool would dynamically call the Dagger API (specifically, the workspace API) to find matching files; construct a properly filtered dagger.Directory; then do the work from there.
Thanks to the workspace API, no files would be uploaded unless specifically requested.
oh that would require a rewrite, the tool is not dagger-aware at all. not dissimilar to me making it a library I can use from a dagger module
I might end-up doing that eventually. tbh I didn't expect "LiveDir" to materialize quite so soon. Can't wait for next release.
Still wouldn't mind the flexibility to filter directories from cli.
I can't be sure but maybe the "wrapper prepares user defaults before calling dagger" approach is a lot easier to start with when you're migrating a non-hermetic build.
....
just got a chance to play with Workspace for a bit.
Have a quick question.
The root of the workspace seems to be the root of the git repo. So by default FindUp will only look in root, right?
Is there a way to get the current/working subdirectory within the workspace?
we've been thinking about it.
The root of the workspace is actually marked by .dagger
The workspace is inside a root filesystem which includes the whole git repo (if there is one)
-
absolute paths are rooted in the rootfs (possibly git repo)
-
relative paths are (or will be soon) relative to the workspace root
we've talked about going further and exposing the client's workdir (which is what you're talking about I think) but not sure how the module would be supposed to use that info
there's a 3d layer to the workspace feature: dynamic artifacts, with tracking of their workspace provenance. This layer will allow filtering of artifacts by source path. That can use client workdir.
see proposal https://gist.github.com/shykes/aa852c54cf25c4da622f64189924de99
That's exactly what I meant. Ideally workspace would have a WorkDir string field, which would be a relative path from its root.
Yeah but, how is the module dev supposed to use it? If you think about it, it adds a dimension of complexity
Some modules might ignore everything outside workdir. Others might scan the whole workspace
There's a risk of fragmentation
Not sure I follow...
Let's say I want to configure a go build based my current dir.
I would first FindUp("go.mod", start=WorkDir), then parse go.mod for replace statements, build up a tree, and start the build from a subdirectory of that new tree
I should clarify probably that my monorepo is a bunch of pretty much independent projects
Now let's say you want to also support building all the artifacts in the workspace. Now the developer of that module needs to add an interface for the end user to configure that behavior. Then each module dev has to re-invent that wheel, with their own custom configuration system.
Then the answer might be to make them independent workspaces. Or, the artifacts feature might allow modules to expose all available go projects within your workspace, and let you select the one you want to build - but in a standard way
well... not entirely. Typically we'd release a library and then use it, but if I want to work on a library and app at the same time, I might add a replace, so having the workspace as the root of the repo is very helpful
You don't have to worry about that, regardless of where you drop a workspace, the rootfs will always be the whole git repo (for the reason you explain - it's just not practical otherwise)
can't a module just ignore WorkDir?
Some will. Others won't. Better check the README to make sure. That's what fragmentation looks like.
@proper rain to be clear, I'm not dismissing your question on workdir. @modest hemlock had the exact same question. Let's say it's an unresolved design point. Your use case will help resolve it π
π I feel like I'm missing something... I do want to understand the counterargument.
if a module is meant to work on the entire workspace, function names would make that clear, won't they? build-all or whatever.
The risk is making every module more complicated to write, and more complicated to use, because each module has to reinvent their own way of handling build vs. build-all, with their own convention. The result is a lower-quality user experience and software ecosystem. It's about drawing the right line of standardization vs. flexibility. Typical platform design problem.
FWIW thereβs prior art to this, gradle has a settings.gradle, which is the equivalent of a workspace root, and any number of build.gradle under that root. All tasks are contextual by default, unless youβre specifically say something like β:buildβ.
There are countless instances of prior art π Basically every build and scripting system has a variation of this problem.
Do you have a positive experience of Gradle's approach?
Yes, I find it intuitive. No build system is perfect, but this part never confused me.
I think the general idea of a workspace containing other projects is pretty established. And usually (that may be subjective) people work on a project and then that project can pull other projects from the same workspace as needed.
btw today I tried using a dev engine for the first time, didn't know what to expect, built a dagger client and it somehow managed to initialize a new engine on the fly. Some black magic.
Oh yeah that works great π What command did you use?
What I use for manual testing, is a full playground build: dev engine + dev CLI all packaged in an ephemeral container:
dagger -m github.com/dagger/dagger@main call engine-dev playground terminal
I used that before for testing, but that gives me a container to play in.
today I did dagger -v call cli binary --platform darwin/arm64 export --path ~/.local/bin/dagger-dev, and then ran dagger-dev and it just created the image somehow
Oh yeah, I think dev CLI builds will auto-download engine from main. Be advised it's not exactly the same commit (we don't carry every engine image for every commit). Basically main CLI will auto-download main engine
oh, that's a lot less woodoo. I thought it figured out a way to build the engine from source
That would be cool π But CLI without an engine is very limited