#CI & hydrated (partial) execution

1 messages · Page 1 of 1 (latest)

iron forum
#

As stated in my other post (https://discord.com/channels/707636530424053791/1049373535849693225), one of the things that is solved by Gitlab CI is re-running a pipeline from a given state. In Gitlab, this takes the shape of re-running a certain job. The jobs prior to it do not need to run again to generate the state, but can instead rely on artefacts passed between jobs to re-run a specific job (a set of commands) multiple times.

In Dagger, this might theoretically be solved by really good caching. However, it is reassuring in Gitlab to know that "all these things will re-run from this point". As also mentioned in the other thread, I tend to write quite small containers, and so I had the idea of "checkpoints".

A checkpoint marks a point from which execution should start, uncached (maybe?), based on existing state. A checkpoint stores exactly the state which is required for execution to proceed (as determined by the DAG). The checkpoint data is then used to hydrate further execution, allowing for partial pipeline execution in Dagger.

Discord

Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.

#

CI & hydrated (partial) execution

leaden lance
# iron forum As stated in my other post (https://discord.com/channels/707636530424053791/1049...

Thanks for posting this, there has been some discussion on our end around similar-ish ideas previously. First I want to double check I fully understand what you're looking for though.

However, it is reassuring in Gitlab to know that "all these things will re-run from this point".
I agree but am curious to get any specific examples of this from you. Maybe, specific examples of situations you don't want to happen that this feature should prevent. Is it a reproducibility issue? Performance? A need to run different parts of a DAG in different execution environments? etc.

In Dagger, this might theoretically be solved by really good caching
Yeah caching is the basis of our thinking around this. You can think of Dagger's cache as the same thing: a cache of artifacts, with one difference being that our cache's artifacts are all just generic filesystem directories identified with a content hash.

But I agree that in and of itself isn't enough. I think we might also need the ability to enforce that cache is re-used and/or the ability to enforce that "roots" of the DAG (i.e. sources like from git, container images, etc.) are using certain pinned versions.

In Gitlab, is it possible for you to submit a pipeline that references artifacts that don't exist anymore? If so, I'm guessing that would just result in an error?

iron forum
leaden lance
#

And then on top of that, make it as easy as possible to avoid cache misses too of course 🙂

iron forum
leaden lance