#How does Dagger establish if a pipeline

1 messages · Page 1 of 1 (latest)

reef basalt
#

How does Dagger establish if a pipeline is cached or needs to run
The shortest possible answer is "based on the Module code and args provided to the Function in that Module".

More details/caveats include:

  • Each invocation of a Function with a given set of args will run once "per-session". A session starts with a CLI invocation and includes the Function invoked by the and any Functions transitively invoked via deps between modules.
    • However, this only applies to Function code itself; any usage of the core API from Functions (e.g. Container, Directory, etc. calls from your module) is cached the same way pre-Zenith.
    • We also are going to add more control for this in the near-ish future, which would allow Functions to be persistently cached across sessions in the same way the core api is.
  • The cache key for a given arg depends on its type. I.e. Container is a hash of the base image digest + any changes that have been made to it (withExec, etc.)
    • The cache key for a Directory depends on its source. A Directory loaded from your local filesystem is a content hash of the loaded files+dirs for example.
#

Does it check this on the host or tar up and send to BuildKit first?
I'm assuming you're referring to when you invoke a function from a local module.

The engine loads the files+directories required to load the module. This is based on the SDK and any include/exclude settings in dagger.json. It tries to load a minimal set required rather than just the whole repo.

That loading of local dirs/files also is more like rsync than "tar up and transfer". The engine caches each local dir load and will re-use that cache across sessions so that it only ever transfers files that have actually changed. This means that if you need to load a lot of files from a local dir, it should be a lot faster after the first time (provided not too many files have changed since last sync).

#

But after loaded, those dirs/files are content hashed, which becomes part of the cache key for the Function invocation

#

(Sorry for the wall of text, there's just a lot of details that are all relevant to answering your question 😅 )