#🚲Conor's test case
1 messages · Page 1 of 1 (latest)
I'll take the opportunity to try out a simpler terminology we were just hammering out after the demo. To see if it makes sense to you.
Why?
- Every software project relies on scripts for the various tasks involved in delivery: build, test, deployment, release, etc.
- These scripts are generally a source of pain: hard to test, hard to compose, hard to reproduce, no API, don't work the same in pre-push and post-push, etc.
- Dagger solves this by allowing each project to encapsulate these scripts into functions, that can be composed in powerful ways, shared and reused at will, and all this cross-language.
How to use it?
-
A software team chooses a Dagger SDK, and uses it to write functions that encapsulate the basic building blocks of delivering their software. One function might produce the project's standard build environment, another might invoke a build with the right set of arguments, another might run the linter, another release to a package registry, etc.
-
The functions are packaged into a Dagger module, which is basically a directory with code + a
dagger.jsonconfig file -
A module can import other modules, just like a regular packaging system. This allows functions from one module to invoke and compose functions from other modules.
-
A module can be published anywhere that code can be published - github, registry etc. Dagger will provide an official publishing system ("universe") but ad-hoc distribution will be possible also. Some security considerations here, how to trust the content etc.
-
To use a module, you run the
daggerCLI and specify which module to load. The CLI will initialize the Dagger Engine, which will load the module and serve its API. From there, the user can query the API in a variety of ways: from the CLI; from a language interpreter; from the GraphQL playground; from a web UI, possibly embedded in eg. an IDE plugin; or from a third-party tool
Connecting this with your example use case:
-
Alice, Bob and Charlie all seem to have unique Dagger functions which they can contribute to the platform. So they would each individual create their own Dagger module.
-
Alice actually seems to have 2 different modules to contribute: a low-level collection of utility functions; and a high-level "end-to-end" module with functions built on top of Alice's and Bob's frontend and backend functions. So that would be a total of 4 modules, with a "diamond" dependency layout.
-
Note that this is one possible layout, but not necessarily the one we advocate. We'll have to figure out best practices together 🙂
Are there any rudimentary proof of concepts or demos for cross-language support? Not having it means that getting people to adopt Dagger within my company is a herculean task and almost dead in the water.
yes there is a working implementation
that we plan to release in October or November
Nice. Excited.
You can try the dev branch right now too, it works (but API not yet stable). See https://github.com/dagger/dagger/tree/main/zenith
I only now got the chance to read back through this, and I like the terms here!
FYI @tepid dust @viral depot @opaque tundra @charred solar , a written version of what I was trying to describe on the call
@bright root
@viral depot @opaque tundra I'm going to use this thread for the lightweight graphql bikeshedding we discussed, since the terminology above is the most up-to-date version of what I described
So I'll just take the story above, and try to express it as graphql 🙂
🚲Conor's test case
OK so I took my earlier draft and removed everything that I felt could be dropped from a first version... And the result might surprise you 😛
extend type Query {
# This part left intentionally blank :-P
}
On the consumer side, maybe... nothing is needed?
- Each session loads one entrypoint module. Controlled at session init, so no need for an
EnvironmentorModuletype - Workdir is set for the session. No need for
withWorkdirorwithContext - Dependencies are passed as strings in
dagger.json - Main module (aka "entrypoint") is configured at session init (eg.
dagger --entrypoint git://...etc) so no need to expose that as code - "Daggerverse" aka search & download modules doesn't make sense without the above (ie. not done by code)
So what's left is the "init" API for SDKs to register functions...
Of course this means no arbitrary loading of modules as code. But I'm thinking that can wait?
wdyt @viral depot @opaque tundra ?
I like starting minimal. There will need to be something though because when the CLI starts up it needs to A) identify that it is being given an entrypoint module (i.e. it sees dagger.json in workdir or is given --module) and B) load that into the engine as an environment MODULE (sorry 🙂 )
Ah right, I was picturing that being passed as CLI flags or config file to the engine, but that engine may be long-running and shared, so that doesn't make sense
One way or the other it has to be passed within a session
Maybe that is THE one valid use of a graphql mutation? 🙂
The most utterly minimal schema I could imagine would be:
extend type Query {
installModule(source: Directory!): Void!
}
(again, discounting the internal-only APIs SDKs use)
Since chaining is not needed, and concurrency is a non-issue, and it DOES modify the state of the system...
that would also be a neat way to make it clearly separate, and probably not in codegen
without actually hiding it
And the neat thing is, you can still pass IDs
Yeah it does make sense as a mutation. The only reason to avoid it is that we'll need to update every SDK to handle Mutation (even if it's just "ignore")
We can do that, it's not a big deal. I personally don't care either way about that because the underlying difference between Query and Mutation is irrelevant to us still. So it's mostly just API aesthetics.
At least right now
Agree that it's aesthetics. It's my favorite option so far. Maps well to admin/not admin too. If you interpret mutations as "anything that can alter the state of the engine itself", you can consider that to be privileged, and maybe apply access control or poliocies later
so for admins it groups all the "scary" things in one place which is reassuring
can we use it to set a workdir / context directory too? I'm less sure about that because it pulls in the local dir wiring, remote API vs local etc
I was just assuming we'd only support the default of workdir being the caller's workdir, in which case no param is needed (engine can load caller "." without the caller telling it anything). But configurable workdir would probably just mean:
type Mutation {
installModule(source: Directory!, workdir: Directory): Void!
}
Oh I see - so workdir could be configured in the CLI then right?
assuming all clients are either the CLI, or wrapping the CLI via dagger session
Yes, something like --module git://... --workdir <whatever local or git path you want>
Yeah, in the case of a raw graphql client, it will need to connect to a dagger listen instance, so by default workdir would be the "." of dagger listen. But we could also handle that specially because dagger listen can disable loading of any local dirs. So in that case, engine would see "oh I can't load ".", I'll just default the workdir to the module source dir", or something
Oh one thing to keep in mind for later: in the CUE version of this, pretty quickly module authors wanted to attach files to their module, to access as resources at runtime. So we'll need a way to do something like query { module { file("data.csv") { ... } } }
Here module simply means "the current module"
Sure, the API in Zenith Checks has a CurrentEnvironment api, which is essentially that. No problem supporting it. But it also isn't strictly necessary because module code (running inside the container) has the dag.Host().Directory api, and it also has both the source + workdir available inside the container.
CurrentModule (or just Module, etc.) would still be mildly convenient, just not as necessary anymore relative to CUE days (IIUC)
for access inside the container, how do we standardize the path? You just open . to get your module source?
host workdir less important to have in the container since there’s already Host.Directory
we agree Host.Directory is not the same as module source right?
"." would be workdir (think that needs to stay for logical consistency). We can either standardize something like /src is module source. Or to avoid hardcoding string assumptions, just have the CurrentModule-like API.
So now that I "type that out loud", yeah probably want CurrentModule API 🙂
Different question, do you want to introduce a concept of a context directory, or is Host.directory(“.”) enough?
difference is subtle but if in the future we get requests for multiple context directories (see the frontend / backend use case above) then workdir can’t do that
but maybe that’s covered by passing directories as argument to your functions
so nothing else is needed
Yeah I think that would be the simplest way. Workdir would remain special because it not only is an input to a function, it's also mounted into the module's container and thus available to be read directly. So that makes me feel okay saying that . is workdir and that's all that's needed for now.
One thing to deal with is modules calling out to other modules. Like say I run a function from some module A from the CLI, but that module internally calls out to a function from a different module B. Should B's workdir be the same workdir of A?
I'd argue no, because it would mean that B's execution is cached based on my workdir. Sometimes that's correct, but if B doesn't actually need anything from the workdir, now we've needlessly invalidated it. So in that case, B's workdir would probably just fallback to the default of being its source directory. Then, if B actually did need something from the workdir, it would just have to accept a Directory input.
And that whole line of thought makes me wonder if workdir even needs to be a thing at all... Like maybe if a function really needs your workdir, it has to accept a Directory as input and the caller has to provide "." as a parameter (from the cli, similar for other interfaces to the api)
maybe important for exporting files to host workdir
unless we dust off the idea of a function that returns a directory
and the CLI treating that in a special way
For that specifically, I like the model where the function has to return a Directory. Then it's totally in control of the client whether to export that, where to export it, etc.
Yeah that 🙂
so we would deprecate directory.export then?
works for me I think, more hermetic
which means more reusable modules on average
Well the CLI would still use it, so probably still needs to exist in one form or another. Also, Directory.Export is helpful inside module code. E.g. the module does a bunch of dagger API calls to build up a directory and then wants to have it locally (maybe it's using a library that requires an actual filesystem, not a "pseudo-filesystem from our api")
But in general I'm onboard with refactoring that whole API (have tangential ideas), but I think it would be orthogonal to all this.
eventually with full remote graphql api, a client will need to download the data from a directoryID right? and simple clients will wrap the CLI to do that
Yeah that would involve adding support for streaming data, which would enable us to stream e.g. a tar of the directory to them
so maybe the module that needs that just invokes the dagger CLI with dagger download xxx or something
so same as today except there wouldn’t be a graphql query for it
assuming all SDKs guarantee the dagger CLI is installed in the runtime container
Sure yeah, the SDKs would need lots of updating since they are pretty much just pure "codegen from graphql schema" today, but anything like that could be made to work with enough effort.
My tangential idea is something like dag.Host().Directory(".").WithDirectory(<whatever dir I want to be export>).Sync(). Which would remove the verb export from our API
But I think for the short term, we would probably just have to still retain Directory.export, just for practical reasons. And then if we refactor that API, it will be picked up in modules and the CLI too. So I think we can consider it orthogonal as long as we're all on board with the idea of functions returning Directory and the CLI handling the rest (if it needs to be exported, etc.)
So, going back to the API, I think if we are onboard with Functions not getting workdir by default and instead requiring they ask for it specifically, then the minimalistic API would go back to:
type Mutation {
installModule(source: DirectoryID!): Void!
}
With that in mind, I will add back a basic "daggerverse" API then
since the CLI will need it to get an ID to pass installModule
by the way isn't "daggerverse" nice? 🙂
I was actually wondering if daggerverse could just be a hardcoded git reference. I.e. CLI v0.10.0 is hardcoded to use universe as github.com/dagger/dagger at tag v0.10.0.
I'm not philosophically opposed to it being burned into our API, but I do think we could avoid it if we wanted.
And yes, love it. It's not just any old universe, it's the daggerverse 😎
I think the API is useful because the daggerverse is more than looking up a directory by name: it’s also search, reputation management, possibly content signing, download stats, maybe eventually publishing too? Perhaps also looking up metadata like “what git remotes work well as a workdir for this module?” or the reverse: “what modules work well with this git repo?”
That makes total sense. It does make me wonder if it would actually just be an API to our cloud service, rather than to the engine.
well it’s both 🙂
think of daggerverse as a builtin module
a dagger module is actually a great way to expose a cloud API, especially one that produces or consumes artifacts
way less boilerplate for the client
Right, but I can't currently imagine why it would have to go client->engine->cloud rather than just client->cloud. But I also lack imagination in that whole realm, so yeah, like I said, not opposed to it.
That makes sense, less api clients in our client 🙂
Cool SGTM