#List of chainable objects does not cache elements

1 messages · Page 1 of 1 (latest)

calm lake
#

Hey!

I think I may have found a bug in the way lists of chainable objects are cached.

When accessing an element of the list, the function generating the list is not cached and evaluated again.
This causes a bunch of issues if the list is created from the state of another system (e.g. list of PRs or comments in GitHub).
It really hinders the performance as well if the function creating the list is slow and there are a lot of elements.

I've created a reproduction example. See this module that generates a list with a time stamp:

package main

import (
    "time"
)

type Generator struct{}

type Element struct {
    A string
}

func (m *Generator) Make() []Element {
    time.Sleep(5 * time.Second)
    t := time.Now().Format(time.DateTime)
    return []Element{
        {A: "[1] " + t},
        {A: "[2] " + t},
        {A: "[3] " + t},
        {A: "[4] " + t},
    }
}

And a module retrieving the values of each element:

package main

import (
    "context"
)

type Consumer struct{}

func (m *Consumer) Do(ctx context.Context) (string, error) {
    elems, err := dag.Generator().Make(ctx)
    if err != nil {
        return "", err
    }

    out := ""
    for _, e := range elems {
        s, err := e.A(ctx)
        if err != nil {
            return "", err
        }
        out += s + "\n"
    }

    return out, nil
}

I would expect that the list is only created once in elems, err := dag.Generator().Make(ctx).
However, each call to s, err := e.A(ctx) will call it again.

Since the Generator.Make function takes 5s then the whole things takes 25s when it should only take 5s.
Also I'm then getting different timestamps for each value (in a real scenario I'm not retrieving the correct github PR or comment)

#

Here is the console output:

❯ dagger call do
✔ connect 0.3s
│ ✔ starting engine 0.1s
│ │ ✔ create 0.1s
│ │ │ ✔ exec docker ps -a --no-trunc --filter name=^/dagger-engine- --format {{.Names}} 0.1s
│ │ │ ┃ dagger-engine-v0.15.1                                                                                                                                                
│ │ │ ✔ exec docker start dagger-engine-v0.15.1 0.0s
│ │ │ ┃ dagger-engine-v0.15.1                                                                                                                                                
│ 
│ ✔ connecting to engine 0.1s
│ ✔ starting session 0.1s

✔ load module 2.6s
│ ✔ finding module configuration 2.6s
│ ✔ initializing module 2.4s
│ ✔ inspecting module metadata 0.1s
│ ✔ loading type definitions 0.1s

✔ parsing command line arguments 0.0s

✔ consumer: Consumer! 0.0s
✔ .do: String! 26.1s
│ ✔ generator: Generator! 0.0s
│ ✔ .make: [GeneratorElement!]! 5.2s
│ 
│ ✔ Generator.make: GeneratorElement! 5.2s
│ ✔ .a: String! 0.0s
│ 
│ ✔ Generator.make: GeneratorElement! 5.2s
│ ✔ .a: String! 0.0s
│ 
│ ✔ Generator.make: GeneratorElement! 5.2s
│ ✔ .a: String! 0.0s
│ 
│ ✔ Generator.make: GeneratorElement! 5.2s
│ ✔ .a: String! 0.0s

[1] 2025-01-13 12:12:04
[2] 2025-01-13 12:12:09
[3] 2025-01-13 12:12:14
[4] 2025-01-13 12:12:19
solemn sage
#

cc @simple comet

calm lake
#

I don't think its exactly the same - in the fan-out example, what takes time is the functions on the elements of the list.
In my example its the function creating the list itself that is slow.

#

And the list creating function is re-evaluated when you access each element when I thought that it would be cached

simple comet
#

Unfortunately module function calls are never cached at the moment

#

It's a longstanding TODO with a lot of complexity attached, but we're getting closer (cc @onyx locust)

calm lake
#

Modules function calls are not cached from inside the module right? In my example I call one module from another module which should be cached if I understand correctly.

astral wasp
calm lake
#

We must be talking about a different kind of caching - I reliably get cache hits when calling another module's function inside the same session. Happy to screen share at some point to show what I mean 🙂

astral wasp
calm lake
#

I'm still not sure I understand, I think I am seeing a different behavior:

I have the following function that takes 5s to run:

func (m *Generator) Single() Element {
    time.Sleep(5 * time.Second)
    t := time.Now().Format(time.DateTime)
    return Element{A: "[Single] " + t}
}

This gets called by another module which only takes 5s, the calls two and three are cached:

func (m *Consumer) Foo(ctx context.Context) (string, error) {
    one, err := dag.Generator().Single().A(ctx)
    if err != nil {
        return "", err
    }

    two, err := dag.Generator().Single().A(ctx)
    if err != nil {
        return "", err
    }

    three, err := dag.Generator().Single().A(ctx)
    if err != nil {
        return "", err
    }
    return one + two + three, nil
}

I was initially surprised because this is not the case when returning lists where there is no module to module caching.

solemn sage
solemn sage
solemn sage
calm lake
solemn sage
#

I wonder what happens in your example if instead of returning a []Element you return a struct containing multiple Elements - I expect the function should only be run once

#

that would confirm there's something weird with the array resolution

calm lake
calm lake
# solemn sage I wonder what happens in your example if instead of returning a `[]Element` you ...

Changed the generator function to return a struct with a list inside:

type ElemList struct {
    List []Element
}

func (m *Generator) Make2() ElemList {
    time.Sleep(5 * time.Second)
    t := time.Now().Format(time.DateTime)
    return ElemList{
        List: []Element{
            {A: "[1] " + t},
            {A: "[2] " + t},
            {A: "[3] " + t},
            {A: "[4] " + t},
        },
    }
}

and the consumer:

func (m *Consumer) Do2(ctx context.Context) (string, error) {
    list, err := dag.Generator().Make2().List(ctx)
    if err != nil {
        return "", err
    }

    out := ""
    for _, e := range list {
        s, err := e.A(ctx)
        if err != nil {
            return "", err
        }
        out += s + "\n"
    }

    return out, nil
}

Success! The list is cached correctly in this case.
So it seems like the issue is to do with returning lists directly

#

Should I open a bug report on your repo for this?

solemn sage
#

yes please!

#

thanks for the super detailed reproducer, i'll try digging into this tomorrow morning 😄

astral wasp
#

Thanks for the extra details @solemn sage, I didn't actually realize we had same-session function caching! But it makes sense actually.