#Is it possible to invalidate the cache without having to build the container again?

1 messages · Page 1 of 1 (latest)

stone belfry
#

I'm doing some e2e testing where a Django app is involved. At some point in the pipeline I have to do something like this:

WithExec([]string{"/src/manage.py", "migrate", "--noinput"})

I'm introducing a time variable to invalidate the cache, but this is also making the build slower because the Dockerfile is also doing things like manage.py collectstatic or manage.py compilemssages, so I have to wait an extra 30s there. I'm trying to figure out if this is something I should be addressing by rearranging my Dockerfile, e.g. multiple stages, or is it something I'm donig wrong in my Dagger pipeline?

Thank you in advance!

lapis folio
#

Is it feasible to make a dump of the database schema before that step? That would act as input so the step would only need to execute if needed. Or something else that can serve the same function.

stone belfry
#

Would you mind to elaborate that idea? I fail to see how would that look like. I was hoping to always run the migrate command because the developer may be locally developing new migrations that we'd want to apply.

lapis folio
#

First of all, you may just need to reorder steps. collectstatic and compilemessages should really run before migrate. Those are static that depend on code, while the latter depends on the database state.

#

But even after that, maybe you don't even need the cache buster for the migrations.

To explain what I meant by "input", let's say before that WithExec you add a file with WithNewFile or something. That file becomes a new input to the WithExec. Any of the steps before it are inputs to it. If any one of those changes, the steps after will re-run.

#

You need to think about when the migrations really need to actually run.

Let's assume your migrations only apply schema transformations (no data) just to simplify. Then the migration code is already an input, so if you add migrations to the project it'll already re-run the migrations. But maybe the second or third time you run the pipeline with the same code, where nothing has changed, the database does need those migrations. The state of the database isn't one of the inputs (unless it's a sqlite file, added to the pipeline), so I suppose that's why you added the cache buster env var in the first place.

But what if you could add the state of the database as a file or something else, to the pipeline, before running migrate? Then that command would only run when it really needs to, instead of all the time. That could be simply a hash of the database dump, added to a file before that WithExec. It could be the dump itself. Or something else that you can query the database to let you know something has changed and that you need to re-run migrations.

#

Finally, you can also break these three things (collect static, compile messages, migrations) into three separate sub-pipelines, "forked" from the same base, and combine their results after (multi-stage build). That's especially if they can take a bit of time. That way they'll be able to run concurrently, and each of those results won't invalidate the cache for the other two.