#Caching functions when successful

1 messages · Page 1 of 1 (latest)

storm dragon
#

Hey, I am new to dagger but am really excited to learn. I love that dagger offers strict type enforcement on IO from jobs and caching.

I'm trying to create a dagger function using the python sdk that:

  • runs a service that can sometimes fail
  • if the service succeeds caching occurs (i dont need to rerun in the future)
  • if the service fails caching does not occur (next time i run the job i can retry)

The dumbest way i can describe this with a toy example is something like

@function
async def test_cache(self, low: int, high: float, limit: float) -> float:

    data =  random.uniform( low,high)
    if data < limit: 
        # if i hit this i dont want caching to occu
        raise ValueError( f"{data} < {limit}")
    return data

And when I run it from dagger shell multiple times id get :

first try random draws a bad value - no caching because we failed

test_cache 0 10 7
ValueError( 6.234 < 7 )

second try random draws a bad value - no caching because we failed

test_cache 0 10 7
ValueError( 1.111 < 7 )

third try random draws a good value

test_cache 0 10 7
8.54

fourth try just uses the cache because we succeed

test_cache 0 10 7
8.54

Is there a way to achieve what I want?

agile cobalt
storm dragon
#

@agile cobalt Thats how i thought it worked. So should i expect this toy example i posted to eventually succeed and then cache?

agile cobalt
#

yes it should work if I understood correctly

storm dragon
#

so what i see when i run it from the dagger shell is that it still caches on the wrong value - note how the 'random' number is the same each time and fails since its less than the limit

▼ myModule | test-cache 0 10 7 0.0s ERROR r jump ↴
! FunctionError: 6.40750401550802 < 7.0
├╴✔ detect module: /home/mike/AutomationExample/replicant 0.0s
╰╴✘ test-cache 0.0s ERROR r jump ↴
! FunctionError: 6.40750401550802 < 7.0
▼ Replicant.testCache(low: 0, high: 10.000000, limit: 7.000000): Float! 0.0s ERROR
! FunctionError: 6.40750401550802 < 7.0
├╴$ Module.runtime: Container! 0.0s CACHED
├╴$ directory: Directory! 0.0s CACHED
╰╴$ Container.withMountedDirectory(
┆ path: "/.daggermod"
┆ source: directory: Directory!
): Container! 0.0s CACHED
▼ myModule | test-cache 0 10 7 0.0s ERROR r jump ↴
▼ Replicant.testCache(low: 0, high: 10.000000, limit: 7.000000): Float! 0.0s ERROR
! FunctionError: 6.40750401550802 < 7.0
├╴$ Module.runtime: Container! 0.0s CACHED
├╴$ directory: Directory! 0.0s CACHED
╰╴$ Container.withMountedDirectory(
┆ path: "/.daggermod"
┆ source: directory: Directory!
): Container! 0.0s CACHED

agile cobalt
#

not familiar with python random but I would double check it's actually sufficiently random, as in: if pseudorandom, does it need a time-based seed

storm dragon
#

It is. I get unique pseduo random results within the same python shell

Python 3.13.5 | packaged by Anaconda, Inc. | (main, Jun 12 2025, 16:09:02) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Ctrl click to launch VS Code Native REPL

import random

random.uniform(0,10)
7.180891201099147
random.uniform(0,10)
7.349566460838779
random.uniform(0,10)
0.5471245897755284
random.uniform(0,10)
7.843426749884054

Within different python shells

Python 3.13.5 | packaged by Anaconda, Inc. | (main, Jun 12 2025, 16:09:02) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Ctrl click to launch VS Code Native REPL

import random

random.uniform(0,10)
4.701366147760128
random.uniform(0,10)
9.847136444620151

#

thanks for your help btw

#

I can also just change my limit field (new inputs) and i do actually get a new number

▶ myModule | test-cache 0 10 8 1.9s
9.140582000302997

storm dragon
#

in fact even if i force a new seed on execution

@function
async def test_cache_2(self, low: int, high: float, limit: float) -> float:
    import random
    import time
    import os

    # Explicitly seed with something non-deterministic
    seed = int(time.time() * 1000) + os.getpid()
    random.seed(seed)
    print(f"Using seed: {seed}")

    data = random.uniform(low, high)
    if data < limit: 
        raise ValueError(f"{data} < {limit}")

    return data

I get the same result

agile cobalt
#

mmmm

frank schooner
#

Hi @storm dragon, I've been testing locally and I seem to have the proper behavior ?

#

Can you try without the shell: dagger call test-cache --low 0 --high 10 --limit 7

storm dragon
#

yes of course

frank schooner
#

We might have an issue when in shell mode ?

If i'm correct and read properly, what you expect is:

  1. as long as an error is returned, it shouldn't be cached for the same inputs
  2. when a success is hit, cache it and return just that, again, for the same inputs
storm dragon
#

thats what i want to see - im not sure if that is "correct dagger behavior or not" becuase im just learning

frank schooner
#

That should be the default

storm dragon
#

can you send me exactly what your module looked like?

frank schooner
#

Nonetheless, if it works with the CLI and not in the shell, it's a bug

#
import dagger
import random
from dagger import dag, function, object_type


@object_type
class Test:
    @function
    async def test_cache(self, low: int, high: float, limit: float) -> float:

        data =  random.uniform( low,high)
        if data < limit: 
            # if i hit this i dont want caching to occu
            raise ValueError( f"{data} < {limit}")
        return data

On latest dagger, with this command dagger call test-cache --low 0 --high 10 --limit 7

#

created with dagger init --sdk=python --name=test

storm dragon
#

confirmed

frank schooner
storm dragon
#

yep

#

heres the cli output

#

heres dagger shell output

frank schooner
#

Ok, so using the CLI it's doing the right caching behavior

#

When in the shell, you're in a long living session. It's a bug, thanks 🙏

storm dragon
#

test | test-cache 0 10 9 1.9s ERROR r jump ↴
┇ ...1 lines hidden...
┃ Traceback (most recent call last):
┃ File "/src/xxh3:6dc10608b1b5157d/test/sdk/src/dagger/mod/_module.py", line 430, in call
┃ result = await cast(typing.Awaitable[R], result)
┃ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
┃ File "/src/xxh3:6dc10608b1b5157d/test/src/test/main.py", line 14, in test_cache
┃ raise ValueError( f"{data} < {limit}"

#

▼ test | test-cache 0 10 9 0.0s ERROR r jump ↴
▼ Test.testCache(low: 0, high: 10.000000, limit: 9.000000): Float! 0.0s ERROR
! FunctionError: 1.6079510533143893 < 9.0
├╴$ Module.runtime: Container! 0.0s CACHED
├╴$ directory: Directory! 0.0s CACHED
╰╴$ Container.withMountedDirectory(
┆ path: "/.daggermod"
┆ source: directory: Directory!
): Container! 0.0s CACHED

frank schooner
#

Can you please make an issue i'll take it

storm dragon
#

sure

#

@frank schooner im actually really glad to hear this - this sort of 'no cache on fail' behavior is one of my favorite features of dagger

frank schooner
#

To unlock you, keep going with the CLI. We already fixed a somewhat similar issue with racing sessions and cache, so thank you 🙏

#

Thanks for the repro. It was easy to verify 🙏 And don't hesitate to ask and ping me on the issue 😇

storm dragon
#

My real usecase is much more complicated than this - this repro was just the dumbest way i could think of to illustrate the issue

#

@frank schooner lmk if you need any assistance verifying the bug fix whenever you get to it

frank schooner
# storm dragon <@274903880343748619> lmk if you need any assistance verifying the bug fix whene...

It's a fix (wrong one), currently working on the proper one, as suggested by Erik: https://github.com/dagger/dagger/pull/11488

GitHub

Fixes #11465
Problem
In a long-lived engine session (interactive shell), function calls that previously failed were incorrectly replayed when invoked again with the same inputs.
The correct behavio...

storm dragon
#

nice - @frank schooner can i ask a follow up question....If i make things slightly more complicated by running a batch of these cached randoms

`@function
async def batch(self, low: float, high: float , limit: float, n: int) -> list[float]:
tasks = [ ]
for k in range( n ):
tasks.append( self.single_retry( low, high ,limit, k))

    r = await asyncio.gather( *tasks, return_exceptions=True )
    return r



async def get_range( self, low,high,limit,k):
    u = random.uniform(low,high)
    if u < limit: 
        raise ValueError(f"BAD {u} {k}")
    return u 
@function 
async def single_retry( self, low: float , high: float ,limit: float, k: int    ) -> float:

    o = await self.get_range( low,high,limit,k)
    return float(o)

`

With a command like
dagger call batch --low 0 --high 10 --limit 7 --n 10

obviously some of the random draws will pass some will fail....what sort of caching behavior should i expect to see for the cases that "pass" versus the cases that fail?

What I'm hoping for is that if 3/10 calls to single_rety "succeed" then calling batch again would use cached versions of the three "successes" and then retry the 7 failures.

frank schooner
#

Hey @rain cape,

I've been trying the https://github.com/dagger/dagger/pull/11488#pullrequestreview-3507541529. Your proposed changes do seem to stop errors from being cached, which is what we want 🙏

However, when the error bubbles up, flightcontrol seems to wrap it as RetryableError and auto-retries. On retry I hit: client "xxx" already exists with different secret token

What I think is happening:

  • First container runs → registers daggerClient with SecretToken "abc" → fails → container exits

  • flightcontrol sees the error, wraps it as RetryableError, and retries automatically

  • New container starts → new SecretToken "xyz" → same ClientID (baked in LLB) → finds old client → token mismatch

    The daggerClient from the first attempt outlives the container, so the retry can't register with its new token.

    I looked at trying to distinguish "application errors" (don't retry) from "transient errors" (retry), but that seems fragile, also potentially allowing the secret to be overriden in that very particular case (which seems OKish, I think?) [without success]

    Here's a trace and my branch if helpful:

  • https://dagger.cloud/dagger/traces/1210a4931ff8afc8a221bf23ff369baa

  • https://github.com/grouville/dagger/tree/fix-in-session-caching-error-experiment

    Any thoughts on the right approach here?

I agree with your comment that it would be best not to add some race conditions in dagger ahah 😇

frank schooner
# storm dragon nice - <@274903880343748619> can i ask a follow up question....If i make things ...

With your current batch code, calling it again will return the entire cached result, including the failures. The batch function itself succeeds (because return_exceptions=True), so it gets cached as a whole.

To get the behavior you want (successes cached, failures retry), call from CLI instead of
nesting:

  for k in $(seq 0 9); do
      dagger call single-retry --low 0 --high 10 --limit 7 --k $k
  done

Run it twice: successes return instantly (0.0s, cached), failures retry with fresh random
values.

Why? Dagger caches call paths through the DAG, not isolated function results. When batch
calls single_retry, the cache key includes the parent context. When you call single_retry
directly from the CLI, each call has a clean cache key based only on its inputs.