#Caching vs. backend state

1 messages · Page 1 of 1 (latest)

fickle hound
#

I am building a simple video transcription pipeline using Dagger. Part of that is fetching the list of files from an Backblaze B2 bucket (like S3) using a container that has the b2 command line tool to get the list of files in the bucket. I then capture the STDOUT of the container as a string with client.Container()...Stdout(ctx).

The first invocation of my program returns the list of files as expected. But when I make a change in the bucket (add or remove a file) and invoke the program a second time, it seems that Dagger returns the cached results from the previous run. I was expecting the container to run again and return a fresh list of files in the bucket.

Do I have a misunderstanding of the way Dagger behaves? What determines the cachability of a container run? In the end, Dagger does not have insight into the backend my container is connecting to?

A made a minimal example to demonstrate the issue at https://github.com/suhlig/spike-dagger-caching

fickle hound
cyan grotto
#

Hi @fickle hound , Dagger invalidates cache for an exec when one of its inputs changes… but only for inputs it knows about. In your case there is a “hidden” input which is the contents of that remote bucket. Dagger can’t know about that since it requires executing your container in the first place.

We plan on adding ways to customize cache invalidation, but for now the solution is to disable caching for that exec. You can do that with a technique called “cache busting”.