I am hoping to develop a stronger intuition for Dagger's internal mechanisms by understanding when a DAG node will or won't be recalculated. (Aside: I think I saw in a recent community call that dagger cloud now presents whether or not a step was pulled from cache or not, so forgive me for not just locking in and testing all of this for myself -- but in any case, I'd like to know if my mental model here makes sense.)
First, what I mean by content-based or topological: consider DAG Input(A) -> Job(B) -> Value(C) -> Job(D), which has been run and cached. Then, the DAG is rerun with Input(Z), which when fed to Job(B) will also output Value(C), e.g. Input(Z) -> Job(B) -> Value(C) -> Job(D). Will Job(D) be recalculated, or will it be retrieved from cache? If cache semantics are content-based, then it will be retrieved from cache, since Value(C) matches the input to Job D from the initial run. If cache semantics are topological, then Job D will be recalculated, since it is a descendent of Input(Z) which has not been cached.
Of course, if we are optimizing for efficiency, content-based cache semantics are superior, as long as you can guarantee that the output of Job(D) actually will be the same in both cases. As far as I know, Dagger can't rigorously guarantee this (e.g. if a container makes an external HTTP request that retrieves a timestamped payload, does the container node's hash now incorporate the payload data?), but if the purpose of the Dagger API is simply to allow you to model your dependency graph with the semantics that are germane to your use case, then this may be "close enough" to catch what matters and ignore what doesn't.
In reading https://docs.dagger.io/features/caching/, it seems to me like the answer might be "both" -- assuming I understand correctly. Running out of characters on this post; will reply with what I mean by this.