#Most of my pipelines have to git clone
1 messages · Page 1 of 1 (latest)
👋
you could achieve the same behavior as Jenkins by running a git clone or git pull in a pipeline and storing that into a cache volume.
My plan is to mount the ref repo as a read only directory, and git clone using that (I will run multiple pipelines in parallel)
This could work but if the repo is quite big (has a lot of files) it could have some performance penalty as Dagger will have to calculate the diff to transfer whatever changed each time
I think I'd prefer to follow my initial suggestion to perform the clone or pull withing your pipeline so it works the same for everyone who runs it
not sure if @runic quartz 's supergit module helps with any of this. Which reminds me: we need better support to add a module description somewhere since Daggerverse doesn't help with that currently.
EDIT: Now I'm fully awake after having some coffee 😄
So current pipeline in Jenkins, is a fan out from the first step
Every "branch" does a steps of clone, npm install, lint, build, deploy (prod or preview/dev)
Of course this is highly inefficient as the only changes between all those are happening during build.
(Different config files for each one).
End goal is to remove all the steps from jenkins, and only use it as a tool to drive builds from events.
(So I can easily replace it at any time).
While going from 30 clones + npm install to 1 is great, bandwidth usage is an issue.
Those builds are triggered on every PRs, which as you can imagine can stack up few of them.
For npm I have a shared cache and for git I have a regurarely updated bare repo.
Which I leverage during ci runs.
I'll start creating a "simple" pipeline in dagger next week and see what ways I can figure out
to accomplish this!
Will keep you posted! Thanks for your answer!
Wouldn’t Dagger’s builtin git operations take care of this, by caching git fetch operations automatically?
My supergit module does aim to experiment with possible improvements to the core API’s Git API, including possibly fine-grained control over caching, but I strongly recommend starting from the core Git operations first, and observing how well it does
Maybe I misunderstood, but it looks like OP’s original Jenkins logic is a workaround for lack of cache-aware git operations
the workaround might not be needed at all
also, hey @supple cedar 👋😁
hey @runic quartz !
Running it multiple times looks like it does cache it.
But I haven't check what happens if a new commit is added, wouldn't that invalidate the cache?
I might just spit out things that dont have a base because I haven't tried yet. I just going over docs etc, before jumping in!
I'll do some tests and get back to you with results!
It will be nice to reduce build times that currently take 20-30m depending on the worker load 😄
Yeah that’s a lot! I’m confident we can bring that down
The exact cache invalidation logic is somewhere in the buildkit source code. Which is why I’m experimenting with supergit, at some point it would be cool to have a 100% userland implementation that is at least as fast. To be clear that is not the case today. I don’t even use cache volumes in supergit yet. It’s not obvious how exactly use the volumes since they are globally shared on your machine, not namespaced by project or anything. My next step is to try and mount only the raw objects on the cache volume, and hope it doesn’t break anything 😁
Maybe I misunderstood, but it looks like OP’s original Jenkins logic is a workaround for lack of cache-aware git operations
This is how Jenkins work "by design" generally when you have stateful agents. IIUC OP wants to achieve a similar behavior since using the Git operation for every new commit might be slow due to the size of the repository as it involves a shallow clone.
Currently all my agents are ephemeral k8s pods with couple host paths mounted (caches), which is why I really needed the git reference repo, so the clone was mostly done locally and just fetching new objects
with dagger, I'll probably deploy a daemonset to have a stable agent on each host, and jenkins will just send jobs to each agent (eg jobs / available agents)
Gotcha. As Solomon mentioned, I'd start by using the default Dagger git operation and see how the baseline performance is. If doing a shallow clone on each new commit is costly, you can try the cache volume approach. Let me know if you need some help with anything. Pretty certain we can bring those numbers down considerably