FYI vito al Helder Correia 👆 I know you | Dagger | Page 1

bright glen Sep 26, 2023, 7:48 PM

#

Yeah, was chatting with @winged finch about it in our 1:1, I figure it'll be the immediate next iteration (as in post-merge). Do you think merging should block on it?

normal elbow Sep 26, 2023, 7:51 PM

#

Depends how much work merging is

#

My first instinct is: we plan to merge without it, and as we do the work to merge, we start actively discussing a design to fix it

#

Then depending on where the design discussion goes, maybe a super short term fix emerges - or maybe not

bright glen Sep 26, 2023, 7:53 PM

#

It'd at least be good to start getting our ideas in writing, yeah

normal elbow Sep 26, 2023, 7:53 PM

#

I think I agree with @winged finch that we should consider just disabling caching on function exec, for now

#

Solving the problem is not worth infinite pain, but it's worth incurring some pain. I don't know exactly what the side effects are of disabling caching of runtime exec, so I may be underestimating the pain 🙂

raw tapir Sep 27, 2023, 12:28 PM

#

My idea for an improvement:

Add an optional arg to withExec for cache: Boolean = true (or set a duration, so 0 would disable and default would be always). This would add an (internal) env var for the shim to interpret.
In Zenith, allow registering a function as not being cached, which would add the cache option to withExec when invoking it. Same perhaps if there's a secret in the inputs or some other metric.
In the Go SDK, set it to false for all functions, until we find a DX for it. The problem is with the implicit nature of the current design there's no current way to add metadata.
In Python this can be supported by the explicit use of the decorator, with @function(cache=False).

I know there's a desire to flag as not cached by the caller, but I think this solves a lot of use cases in a simple way.

normal elbow Sep 27, 2023, 3:04 PM

#

Thanks Helder. I’m not convinced functions ever need caching - and if I had to choose as a module developer, I’m not sure I would know how to choose

raw tapir Sep 27, 2023, 3:20 PM

#

There's some use cases for it. Other than the usual performance benefit, it allows caching the usage of third party libraries like a AWS client or GCR. Current workaround for those is to use a CLI (if exists) to run in a withExec. Not only that, I've seen quite a bit of computation on some users' pipelines for getting something out of a container with dagger, doing some work on it language-side, then using that to feed into dagger again. Caching the function would help these cases as well.

Besides, we wanted to focus our CLI usage around querying data, instead of execution. Even though cache is always best effort it's harder for me to imagine always executing instead of getting cached results.

I understand what you're saying though. You need to grok dagger caching a bit to take full advantage, but it may be too much for those that don't yet. I'd personally want to be able to cache the whole function if I wanted to though.

bright glen Sep 27, 2023, 4:33 PM

#

I think there's a strong argument that functions should be cached for the exact same reasons anything else should be cached: they should be wholly dependent on their inputs, and if changing-over-time is a reality of the use case, time itself should be represented as an input (perhaps an automated one). It's the necessary magic to bring non-DAG code (i.e. native language libraries) on the same level as the rest of Dagger without having to awkwardly move code into a WithExec or something.

Even aside from semantics, I imagine you'll want caching as soon as your stack is calling a bunch of functions, and each function call is spinning up a container, running code, streaming out the result, etc. (edit: the fact modules themselves will run by calling the SDK runtime module to build themselves will make this even more apparent!)

IMO we really need to start analyzing each use case. Right now I'm mostly hearing "they shouldn't be cached" or "they should be cached" but without any info on the use case that drove that opinion. I don't want to over-index on toy code.

#

I'm OK with disabling caching until we have a better story, my only worry is that depending on how quickly the ecosystem grows, it'll be harder to add caching-by-default later than to have caching-by-default now and continue using cache busters as a crutch for the ideally brief amount of time until we implement a better solution. (People will develop the wrong mental model for functions, and any functions written assuming no-caching will not take kindly to caching by default.)

bright glen Sep 27, 2023, 5:03 PM

#

Erik and I have also talked about adding generic query-level caching, which is a huge part of my mental equation for v2 IDs, and would cleanly replace some of the current per-resolver caching (Container.import, Container.build)

bright glen Sep 27, 2023, 5:31 PM

#

Building on the idea of query-level caching, it seems like cache control could be supported on both the caller-side and module-side via query/schema directives:

One-off query with cache control:

query {
  myModule {
    myFunction(...) @cacheEvery(duration: "1m") {
      ...
    }
  }
}

Module schema:

type MyModule {
  myFunction(...): MyResult! @cacheEvery(duration: "1m") # or 0, if you want no caching at all
}

This @cacheEvery directive could work by passing an additional now: <RFC time format> value to the function on every invocation. Something like that. Lots of stuff to bikeshed here; using Go's duration format is a bit of a shortcut.

And then we'd have to figure out how to translate that to each SDK. 😅

normal elbow Sep 27, 2023, 6:58 PM

#

Here’s where I stand so far:

We don’t know enough (yet) to produce the finished design for function caching. We need more modules to be written and used to do real things.
Therefore we should not (yet) ship explicit cache controls: we don’t (yet) know enough to design them right
In the meantime we should disable caching of runtime exec. The penalty is slower runs (but we don’t know how much slower in practice). If we did the opposite and cached by default, the penalty would be mostly unusable modules, so it’s an obvious choice for me.
My suspicion (I could be wrong) is that uncached runtime exec will have very little actual impact since 1) runtime build is still cached (so importing libraries is not affected) and 2) the function itself is just vehicle for making dagger API calls, and those will still be cached

raw tapir Sep 27, 2023, 7:28 PM

#

On that last point, on 1) it's not about installing dependencies, it's about doing work inside functions using third-party or own language-level code (non Dagger calls, so no engine cache). Those potentially do I/O (calling other APIs) or CPU intensive work. These will at least be parallelized by the engine but the work will still be repeated. On 2) we don't have query level caching yet so caching is done on each node of the DAG. In the case of functions it's the withExec with the SDK runtime that's used to invoke a function, and they only get invoked on API calls (except another function from the current module, depending on SDK implementation).

normal elbow Sep 27, 2023, 7:35 PM

#

raw tapir On that last point, on 1) it's not about installing dependencies, it's about doi...

it's not about installing dependencies, it's about doing work inside functions

Ah I see what you mean. To me, that goes in the "we don't know enough yet" bucket. It may be that saying "move CPU-intensive work to sub-containers for better caching" is enough. Or maybe it will become the number one pain point. We'll find out soon enough!

we don't have query level caching yet so caching is done on each node of the DAG. In the case of functions it's the withExec with the SDK runtime that's used to invoke a function, and they only get invoked on API calls (except another function from the current module, depending on SDK implementation).

I didn't understand this at all, sorry if I'm missing something obvious. I'm talking about the dagger-in-dagger calls made by the function itself. My understanding is that if I call a function Foo, which in turns calls Container().From().WithExec(), this will happen:

From core: <module-runtime>.WithExec(<the runtime binary>) -> not cached
- From function Foo: Container().From() -> cached
- From function Foo: .WithExec() -> cached

So I'm just talking about regular buildkit caching of the actual work done by the module: I assume those operations would be cached?

raw tapir Sep 27, 2023, 7:44 PM

#

I see what you meant now. Yes you're right, but while you see functions as just vehicles for API calls, from what I've seen from users' code I'm convinced they will want to do more than that. I agree that we'll find out more, so I'm not pushing for anything. If a function is just building queries, I agree that a non-cached function will have low impact.

As for third-party API calls, I remember having discussions about that, including feedback from @winged finch after a presentation where he realized that while we pointed out as a strength of Dagger (at the time at least) that you could leverage external libraries in your language like AWS or GCR, those wouldn't be cached, so you need to nudge people to adopt their CLIs instead which is not as nice to develop with, or containerize as much as possible. I remember realizing with Erik that this default function cache would solve this.

normal elbow Sep 27, 2023, 7:45 PM

#

OK glad we're aligned 🙂

raw tapir Sep 27, 2023, 7:45 PM

#

To be clear, I think we all agree that we need to disable for now, but I'm with @bright glen in wanting to recover it when possible.

normal elbow Sep 27, 2023, 7:46 PM

#

Switching to speculative bikeshedding on future design: my guess is that the default will remain uncached, with an opt-in for "hermetic" functions with performance benefits as the reward. I think the opposite will just be too surprising to too many people to work as a default.

raw tapir Sep 27, 2023, 7:56 PM

#

I lean on that opinion as well, but I hope for the opposite 🙂 Curious how it'll turn out.

bright glen Sep 27, 2023, 7:57 PM

#

Yeah - if we do no-caching by default, it's nice that there are natural incentives for marking things as hermetic for both performance and provenance (the whole idea around publishing a v2 ID as a recipe)

normal elbow Sep 27, 2023, 8:23 PM

#

raw tapir I lean on that opinion as well, but I hope for the opposite 🙂 Curious how it'll...

https://tenor.com/view/the-two-towers-lord-of-the-rings-aragorn-helm’s-deep-hope-gif-8669068073397169641

Tenor

bright glen Sep 28, 2023, 3:14 AM

#

btw, re: function call overhead w/out caching, I'm seeing a no-op dagger mod sync take 0.26-0.43s with caching, and ~5s without. This is much more noticeable now that SDKs are modules. 😅 https://gist.github.com/vito/b3801634007d54cbf9fdefb867c528fe

We can special-case SDK runtimes to always cache until we figure everything out, so no pressure to bring caching back, just interesting and kind of surprising how quickly the overhead stacked up. Maybe there's some low-hanging fruit to be found.

#

I went back to before the SDKs-as-modules merge and the same dagger mod sync performance stayed around ~.5s with and without caching.

bright glen Sep 28, 2023, 1:42 PM

#

Added caching for internal calls, dagger mod sync back down to 0.25-0.48s. https://github.com/shykes/dagger/commit/a13a3eeb784fa2ae2e3e0b02fcd9932afd6527c0

#FYI vito al Helder Correia 👆 I know you