#[SOLVED] What is the most optimal approach for Dagger cache mechanism?

1 messages · Page 1 of 1 (latest)

errant spire
#

I think I've optimized as much as possible my Dagger code (Python SDK) by using a lot of volumes (like putting in a volume the cache for Linux apt, another volume for pip, another one for php composer dependencies and so on. For all my projects, I've, I think, go as far I can (=as far I know I can).

But this optimization has a huge weakness: changing one character in my project's codebase.

Here is an example:

Below I'm calling my phpunit function (to run unit tests in PHP) and that does a lot of initialization / downloads (think to apt-get install, pecl install (redis, phpcov), composer install, build a postgres and redis container and much more). A lot of things and it can take up to 2 minutes

dagger call --source-directory=/app/src/php --config-directory=/app/src/php/.config phpunit

If I rerun that command immediately (no change at all), caching is very powerful and it's now just a question of seconds but as soon as I change one character in my codebase (folder /app/src/php) or in my configuration files (folder /app/src/php/.config), the cache seems fully invalidated and the next call to my function will redo the full process.

In fact, if I just do a stupid change in my configuration file; I don't expect that Dagger will download again linux binaries (apt-get), composer dependencies, ... and so on.

Composer dependencies to take that example has to be rebuild only if I do changes to my composer.json.

But ... for the illustration, once I've fired my phpunit , I can call phpstan or phan or phpcbf or ... (static tools for PHP) and here, because I've not change my codebase, all caches are well reused (it's so great!) and actions are running fast.

Am I doing something wrong / how can I use Dagger cache as much as possible?

Thanks!

oblique scarab
#

Hello 👋

A few things to look at

  • It sounds like you're already using cache volumes where possible https://docs.dagger.io/api/cache-volumes. These are for that exact case where your input has invalidated the cache and you want a volume to avoid re-downloading dependencies
  • Filter down the input directories as much as possible to avoid unneccessary cache invalidation https://docs.dagger.io/api/filters#pre-call-filtering
  • Pin images to a specific version or sha. (i.e avoid :latest)
  • Try to order chains in your pipeline from least changing -> most changing if that makes sense. For example
dag.container()
  .from_("alpine:3")
  .with_exec(["apk", "add", "curl"])
  .with_mounted_cache("/root/.cache/pip", dag.cache_volume("python-311"))
  .with_directory("/src", source)
  .with_workdir("/mnt")
  .with_exec(["pip", "install", "-r", "requirements.txt"])

The source is mounted after running apk add since the source changes frequently but the previous operations in the chain rarely change

Another optimization, usually unnecessary with cache volumes, is to install dependencies before adding in the entire source input

dag.container()
  .from_("alpine:3")
  .with_exec(["apk", "add", "curl"])
  .with_mounted_cache("/root/.cache/pip", dag.cache_volume("python-311"))
  .with_workdir("/mnt")
  .with_file("/src/requirements.txt", source.file("requirements.txt"))
  .with_exec(["pip", "install", "-r", "requirements.txt"])
  .with_directory("/src", source)
  .with_exec("pytest")

But like I said, in practice cache volumes used appropriately makes this unneccessary

When you pass a directory to a Dagger Function as argument, Dagger uploads everything in that directory tree to the Dagger Engine. For large monorepos or directories containing large-sized files, this can significantly slow down your Dagger Function while filesystem contents are transferred. To mitigate this problem, Dagger lets you apply filter...

errant spire
#

Hi Kyle and MANY THANKS. I thought I've done everything but I also do a very stupid thing and that assertion did the trick:

The source is mounted after running ;;;

Oh, I feel stupid...You right, of course, and I knew it. I didn't see this in my code; damned.

Thanks Kyle for having taking time to reply with a so valuable answer.

#

[SOLVED] What is the most optimal approach for Dagger cache mechanism?