#What state is used to determine a cache key ?

1 messages · Page 1 of 1 (latest)

dusk herald
#

Hello,

I think I’m missing some concepts around cache management.

We’re writing a module for which we want to offer a fluid interface
Something like
MyModule().withOption1().run()

with withOption1(value: bool) implementation as follow :
self.option1 = value

and run :
ctr = ctr.withExec(["x"]) if (self.option1) else ctr.exec(["y"])

Our problem is that all successive calls to run will use the same cache, regardless of the withOption1(true|false)prior call.

In other words, MyModule().withOption1(true).run() will (wrongly) use the cache produced by a previous MyModule().withOption1(false).run()

Is there some documentation on which state is used to compute a cache key ?

As a side question, is there a way to manually define a cache key ?
[Edit : that doesn't seem to be the case yet, according to this doc: https://docs.dagger.io/cookbook#invalidate-cache]

Thanks !

Filesystem

autumn blaze
#

@dusk herald we're working on more powerful cache control. See https://github.com/dagger/dagger/issues/7428

In your case, normally a change in your object state should invalidate the cache. Are you sure you correctly pass the state around? Would you mind sharing your code?

Also - currently calls to dagger functions are never cached across sessions.

GitHub

Problem Dagger has great caching, but Dagger Functions don't fully benefit from it, because their runtime containers are not cached. This has several consequences: Functions that perform comput...

fast temple
#

Hello @autumn blaze, thanks for your reply!
We finally managed to have everything working as intended.

We started with a code looking like this:

@object_type
class Dotnet:
  def __init__(self):
    self.nuget_package_source = "nuget.org"

  @function
  def with_nuget_package_source(self, source: str) -> Dotnet
    self.nuget_package_source = value

  @function
  def run(source: Directory) -> Dotnet
    container = (dag.container()
                 .from_("mcr.microsoft.com/dotnet/sdk:8.0")
                 .with_directory("/app", source)
                 .with_workdir("/app")
                 .with_mounted_cache("/root/.nuget/packages", dag.cache_volume("nuget-cache"))
                 .with_exec(["dotnet", "restore", "-s", self.nuget_package_source])
               

And at this point we were never able to have the provided nuget package source be taken into account, it was always the same default value set in the constructor that was used.
For instance: Dotnet().with_nuget_package_source("https://my.private.nuget.repo.net").run() would never use our private repository.

The logs were also showing that everything was retrieved from the cache, hence the question.

#

Then, after checking some Python modules from the Daggerverse, we went for this version:

@object_type
class Dotnet:
  nuget_package_source: str | None = "nuget.org"

  @function
  def with_nuget_package_source(self, source: str) -> Dotnet
    self.nuget_package_source = value

  @function
  def run(source: Directory) -> Dotnet
    container = (dag.container()
                 .from_("mcr.microsoft.com/dotnet/sdk:8.0")
                 .with_directory("/app", source)
                 .with_workdir("/app")
                 .with_mounted_cache("/root/.nuget/packages", dag.cache_volume("nuget-cache"))
                 .with_exec(["dotnet", "restore", "-s", self.nuget_package_source])
               

The nuget_package_source variable was changed from an instance variable to a class variable.
And now it seems to be taking any provided nuget source correctly into account.

Looks like this could have more to do with our somewhat lack of experience with Python, rather than with Dagger itself. 🤷‍♂️
Thanks again for taking the time to take a look!

autumn blaze
amber nest
fast temple
#

Hello @amber nest, thank you for your help!

Regarding the returning of self at the end of with_nuget_package_source I just did a bad copy/pasting of the original code, but the line is actually there, sorry for that 😬

Thank you for pointing out to the pages you mention, I feel ashamed not to have discovered them by myself!

I have two questions about them:

  • I've seen some modules using a __post_init__ method to set the value of the inner Container instance based of the value of the class attributes, while others perform some lazy initializing in a container() function; is there a preferred way to do this?
  • From the State and Getters documentation, I understand there is no way to have a field that is not exposed either as a constructor or as a function in the public API, and that survives to serialization/deserialization process, do you confirm?
amber nest
# fast temple Hello <@768585883120173076>, thank you for your help! Regarding the returning o...

Don't think of them as class attributes, they're actually instance attributes. The dataclass decorator (and @dagger.object_type wraps that) essentially generates a constructor for you, to reduce boilerplate. On a fundamental level this:

@dataclass
class Dotnet:
    nuget_package_source: str | None = "nuget.org"

Is equivalent to this (i.e., non-decorated class):

class Dotnet:
    nuget_package_source: str | None
  
    def __init__(self, nuget_package_source: str | None = "nuget.org"):
        self.nuget_package_source = nuget_package_source

But the data class also adds a bunch of other methods that make these instances easier to work with (e.g., string representation, comparison, immutability, etc…). So the @dataclass decorator normally generates a __init__ for you, unless you define one yourself.

If you wanted to make a class variable, you’d define it with:

@dataclass
class Dotnet:
    nuget_package_source: ClassVar[str | None] = "nuget.org"

Then that attribute doesn’t become an object instance.

#

You want to take advantage of the generated __init__ but sometimes you still need to initialize custom stuff in a class’s constructor. That’s what __post_init__ is for. If you have one, @dataclass will call it at the end of the generated __init__. So this:

@dataclass
class Dotnet:
    nuget_package_source: str | None = None
    
    def __post_init__(self):
        if self.nuget_package_source is None:
            self.nuget_package_source = "nuget.org"

Generates something like this:

class Dotnet:
    nuget_package_source: str | None
  
    def __init__(self, nuget_package_source: str | None = None):
        self.nuget_package_source = nuget_package_source

        # post init
        if self.nuget_package_source is None:
            self.nuget_package_source = "nuget.org"
amber nest
# fast temple Hello <@768585883120173076>, thank you for your help! Regarding the returning o...

I've seen some modules using a __post_init__ method to set the value of the inner Container instance based of the value of the class attributes, while others perform some lazy initializing in a container() function; is there a preferred way to do this?

To answer your question, using __post_init__ over a container() function depends on what you’re trying to do, and how you want people to be able to use your module. Just do the simplest thing based on that.

You can even define a default in the attribute itself via a default factory function, if you can put it in a normal function without any arguments:

@dagger.object_type
class Dotnet:
    nuget_package_source: str | None = None
    ctr: dagger.Container = dataclass.field(default_factory=lambda: dag.container().from_("alpine"))
amber nest
# fast temple Hello <@768585883120173076>, thank you for your help! Regarding the returning o...

From the State and Getters documentation, I understand there is no way to have a field that is not exposed either as a constructor or as a function in the public API, and that survives to serialization/deserialization process, do you confirm?

No, the limitation you’re probably talking about is in Go (and Java), but it’s about making it a public field in Go, not the API. You can still exclude from the public API through a // +private pragma. That pragma is necessary because public fields in Go are registered on the API by default.

In Python there’s no such limitation though. Additionally, the API registration in Python is explicit rather than the other way around. If you don’t want to expose a field in the public API’s constructor then you have to exclude it from the constructor (with dataclasses.field(init=False)) and use post-init.

You use dagger.field(init=False) when you want to exclude it from the constructor but want the field itself be exposed to the Dagger API as a function. That's the difference between dagger.field and dataclasses.field.