#Hey Scott! Are you looking for example
1 messages ยท Page 1 of 1 (latest)
@primal pebble - Thanks for replying. I'm looking for something that holds a more "A to Z" usage of Dagger in a k8s environment. My problem is, I don't believe I've gotten the paradigm switch to how Dagger should be used. For instance, I can create a "runner pod" and it has the Dagger CLI, can connect to a Dagger Engine and I can run Dagger modules, etc., but what should the runner run? Only CI code written for Dagger? Or should a runner run things like tests, itself? Or, should Dagger be given all the tasks, which should be ran for the CI run/ workflow?
I also need to communicate back to the repository management system, to show passing or failing CI steps in the PR commits. Is that done via the runner? Or is it done via the tasks ran by Dagger? ๐ค
@hoary bronze an excellent question, which has a 2 part answer:
- Part 1: what is possible and defined today.
- Part 2: complete and stable best practices. This is work in progress.
Since we shipped Functions & Daggerverse, production-readiness is our #1 priority, and this is part of it
I can describe where we're aiming to be (part 2), if you'd like, and then we can talk about the road to get there ๐
Puh. If you have the time. I'd love it. I think what might even be written could be put into a blog article, more than likely. ๐ Would be a waste just to answer my own lack of understanding. I bet there might be others in the same boat? Maybe? ๐
Absolutely. That is precisely what is work in progress
Talking about it with you is part of the process - it's why we're here ๐ We build the general case, one specific case at a time.
Great! ๐
If I can help. I must add, I'm not the most experienced with CI.
And it amazes me how many have already taken what is already offered and have ran with it so easily. Why I have such a mental block is frustrating.
So, the first important principle, is that 1) what to run on Dagger, and 2) how and where to run Dagger, are always decoupled. You can decide one independently of the other.
The first is about how much of your pipeline logic you want to daggerize - this depends entirely on the constraints of your project.
The second is about your compute infrastructure and how you want to use it to run Dagger . In this case, Kubernetes. But also your local machine.
Right. I was figuring the answer on how to use Dagger would be "it depends".
Yes exactly.
Our advice is to Daggerize as much as you can get away with, within the constraints of your project.
In the extreme version of this, you can collapse your entire CI into a single Dagger Function, which receives the raw Github/Gitlab event, and takes it from there. Literally the entire CI workflow expressed in code.
Ok. I'm thinking the runner would be more like a a controller. It will handle running tasks, but also with communicating to the outside world. Does that make sense?
On the extreme other end, you only daggerize simple individual tasks - one Function per task - and manually configure your CI to call each function. This allows for very incremental adoption of Dagger in an existing stack that is hard to change
Yeah. I was wondering why not just make a "CI API" application that can take in webhooks, but uses Dagger to run the different CI steps.
Would you consider the runner/ controller approach a way to do things? I'm seeing Dagger modules as tasks, but wondering about the difficulty of making them abstract enough to be reusable. Maybe it is the abilities to be abstract that are eluding me?
For instance, it shouldn't matter which repo is being used, a test-suite run should just run. The examples all offer some simple "hello world", and why I was asking about a more expansive example. ๐
To see how the abstraction to run, for example, any test-suite (in specially configed repos I must add) can be achieved.
Yes you can do that. There is a project to do exactly that called "PocketCI" by @primal pebble and @frosty wren .
When you follow this pattern, the simplest design is to simply pass the whole event to a Dagger Function, which in turn can dispatch to other functions. No need to reinvent a pseudo-code to do that, when you can just write real code.
Yes, Dagger Functions are very good at giving you a reusable abstraction for any task. That is basically the killer feature
Here's an example, where our own official Markdown linter is available as a function, which you can apply to any source directory:
I saw the Pocket CI demo and was thinking, how cool is that, but it also generates Engines, if I recall correctly.
dagger call -m github.com/dagger/dagger/linters/markdown \
lint --source=https://github.com/shykes/daggerverse \
checks \
diff
dagger call -m github.com/dagger/dagger/linters/markdown \
lint --source=https://github.com/shykes/daggerverse \
checks \
format --text 'Error at {{.FileName}}:{{.LineNumber}}: {{.RuleDescription}}'
docs for that module: http://daggerverse.dev/mod/github.com/dagger/dagger/linters/markdown
Would you consider the runner/ controller approach a way to do things?
Could you explain what you mean by "runner / controller approach"?
I must add, we are wanting to create a fully remote development platform. So, we (and hopefully customers/ tenants) will be developing in workspace pods, inside k8s. No local dev at all.
I'll have a deep look into the linter.
And if I have that, there is one last question I think open in my mind. Should the runner make CLI calls in a script for the tasks? Or can I run just one CI workflow app and call modules through the code? I'm sort of lost there too. LOL! ๐
Yeah CLI is the standard entrypoint. The one universal interface every CI system understands.
Specifically you call functions with dagger call
Could you explain what you mean by "runner / controller approach"?
Dagger Engine is "dumb" i.e. it won't do anything without the runner/ controller saying "do x, then y, then z".
Ok. I think I'm getting a better model of what should be happening. Can I try to write out a use case and you can correct me, if I am off-track?
Of course
That's entirely up to you. You simply decide how much logic you want inside and outside Dagger. The inside part is your Dagger Function, the outside part is whatever runs dagger call. Your function can be very small with no dependencies, or huge with 100 sub-functions. In the end it's all just code, so you're in control.
A typical trend is to start with a thin Dagger Function called by a thick controller. Then over time, the functions multiply, and grow thicker, as you find new ways to use the platform. Over time, Dagger "eats" all your glue scripts and other messiness, and spits out simpler, cleaner code. This process can happen very quickly, or very slowly, depending on your project.
**Use case: **
We have a developer and a dev lead (can be the same dev or someone else), a repo/ version management system (VMS) and a CI system and a CD system as actors. (CI system including Dagger of course).
-
Developer develops "locally" i.e. in her own workspace pod inside a dev cluster and has all the CLI commands to run tests locally. Linting is done automatically for the developer at commit time (locally).
-
The dev creates a PR. This kicks off the CI via a webhook to Argo-Workflows via Argo-Events.
-
Argo Workflows creates and runs a CI runner pod, pre-built with Dagger CLI and our Dagger module. The CI runner pod runs, via our Dagger module, linting (again), testing and any other "checks" we need to run, to allow the code to be promoted (PR merged) to the next stage on the platform. The runner also communicates back to the VMS on task accomplishment status.
-
A Dev lead would then review and approve the PR for merging. Once merged, another CI runner will start the build process i.e. to build the container for the application (using Dagger again).
-
Once this build step is accomplished, the update to the repo ( usually just a version number increment) via the runner to the VMS) would kick off the CD process with Kargo, to promote the code to the next stage (depending on how the tenant had formulated her QA/ Staging process).
Boom. Process done. ๐
If that sound like an "ok" path to you, I think I have my mental model set and can now work on the process. Do let me know, if I'm "off track" somehow, in any way, even from a process standpoint (like I said, I'm inexperienced with the CI/CD process workflow actually). If it is ok and when I can get this done, would you be interested in me demoing it in a Community Call?
yeah this seems right. A few small notes:
- you don't need to pre-build specific functions into the pod. Dagger builds it all on the fly from source
- step 5. (CD) can possibly be a Dagger Function (hand off to kargo)
- Infra consideration: you will need the dagger pods to be privileged
- Infra consideration 2: cache persistence is important to performance, this is an area with a lot of work left to do, but there are optimizations available already
you don't need to pre-build specific functions into the pod. Dagger builds it all on the fly from source
But, wouldn't pre-building the Dagger code be faster? So, the pod would just run thedagger callto the pre-bulit module and it also works?
step 5. (CD) can possibly be a Dagger Function (hand off to kargo)
Sure. I can see that. My understanding on k8s is there needs to be a set config available though. It's an "actual state" in the config to fall back on or reconcile from. So, the update to the VMS is necessary from what I know. I can be wrong though. ๐
Infra consideration: you will need the dagger pods to be privileged
Yes, I caught that being noted a couple of times in conversations here. We've been devising some ideas around this already and from a platform standpoint. Certain parts of the tenant's repo will be managed by the platform. If those parts are "messed with" at all outside of the platform, the pull of the repo into any task will fail, before any code is ran. If the tenant wishes to procede, they must revert back any changes made to those sensitive areas. Also, another part is dependency acceptance. We will also have other "checks" in place for security. Basically, we want to offer flexibility, but not at the cost of safety. ๐
Infra consideration 2: cache persistence is important to performance, this is an area with a lot of work left to do, but there are optimizations available already
Yep. Had a conversation already about this. I believe it was Lev who noted that the engines, being they are non-ephemeral, will hydrate cache over time. Not sure how well that will work with many different tenants, but we'll see. If we have paying customers (and we're a long way from that), we will have funds to "pass on" to our dependencies. We are also working on a business plan that includes making sure a cut of revenues go to OSS projects wanting support or have Enterprise offerings. We'll see how that goes too. ๐
I believe this OSS support idea will be innovative.
Cause, there is absolutely no way we could make this platform happen without OSS and not to give back to those looking for support would be criminal.
Pre-building won't be faster. Dagger caches everything, so first run will take care of pre-building. All subsequent runs re-use that.
As we improve the caching system, this will get faster without extra work.
Cool. Ok. Good to know about the caching. ๐๐ป
It also means one less step in the process for developing the CI runners ๐ .
Which will dogfood the platform. Haha!
Sort of a chicken and egg dilemma... LOL! ๐
Initially it's more work for us to build a caching system powerful enough to do all this work. But it pays off 1000x by saving every user boatloads of engineering and compute cycles over time.
I keep telling myself, if I am running into chicken and egg dilemmas, and I do run into them often with this project, it must mean I'm either doing something wrong, or it is new.
A future improvement is that your Argo/Github/Gitlab/PocketCI worker can itself run as a long-running Dagger function on the same cluster, instead of an external service. At that point you can bootstrap your entire CI stack as a Kub deployment
So Dagger runs your CI worker as a function, which itself calls more functions.
Essentially Dagger becomes a self-hosted distrubuted OS.
That would be cool. I keep thinking how I can just make an API app (as I noted earlier) that runs the Dagger modules as needed.
So, the API application is a runner that just runs all the time.
Yes exactly. Then you run that API application in Dagger ๐
And there you have me again. The paradigm change isn't there yet I guess.... ๐
You can safely ignore this last part. It's not yet an established practice - just me imagining a wonderful future
Sure, I can run a go application as a Dagger module, and that app is, in itself, an API app too.
Your current scenario is good. And can be incrementally improved over time as the platform gets new features
Or are you saying, Dagger would fire up a container and run the app?
yes that
Hmm. Ok. Back to not being in the paradigm. How would the container be "hooked up" into the k8s environment?
I was imagining the container work Dagger does being its own little island so-to-speak. And only inputs and outputs of the container being handled by the module. ๐ค
I mean, nothing Dagger does speaks to the k8s API, right?
Yes it's an island, but the function itself is run by an engine inside a pod. So kub sees the dagger pod "doing stuff"
You scale your dagger pods just like any other pods, with full control over scheduling, storage etc.
Basically the same best practices for running any CI on Kubernetes, would apply to Dagger
"your dagger pods" - at this point, I only have engines and runners with the Dagger CLI. Are there more/ others?
I mean the engines (which kub runs in pods)
Right. But, say an Engine creates a container to do stuff, how can it be wired up to k8s to run like as if it were a pod? I think I'm totally missing this understanding/ the capability.
No that capability won't exist. The engine runs containers itself, without interacting with k8s. But it does this while running inside a pod. So: containers inside containers. K8s supports this just fine (you just need the pod to be privileged).
Obviously this only applies when running Dagger on Kubernetes.
Ahh... ok. Yeah, and thus I wouldn't want the Engines running anything that is outside our control. Though, I do see the runners being API apps themselves. That is doable. And the "workhorse" to do tasks (theoretically anything) is the Engine.
Exactly. You can think of the Dagger Engine as a coprocessor that you throw work at
And, currently I only see do or die workflow state with Dagger. I can, however, imagine Temporal included to run the "activities" (Dagger Modules) and with that, we get retries, number of attempts, i.e. guaranteed execution. ๐ Oohhh. that just came to me too.... ๐
Thanks for this very valuable conversation Solomon. I appreciate it so much, because it gets me going with a plan I can follow. My mental model was totally murky (a mental block even) before we started. Now I feel much better and I can "see" where I'm/ we're going. ๐
And again, if you are intersted after I get this going, I'd love to demo what we came up with. Might take a while though.
Awesome! Nighty night. Don't let the bed bugs bite! ๐