Cache function calls per-client to avoid... | Dagger | Page 1

crystal gulch Feb 17, 2025, 9:35 PM

#

fwiw, this benchmark is the most intentionally brutal one with a load of modules calling other modules in a quadratic fanout.

#

i also love that we broke it almost immediately after merging while i've been slowly baking discord notifications for failure in the background XD

#

confirmed the oomkill with a passing run on f920f96f and a failing one on 3a28d13c

#

cc @silk mural too here, since i think there are sacrifices here to the make-secrets-work gods and i know we both reviewed that PR... there are a couple ways to proceed:

#

dig on this and find another less memory intensive way of caching across clients to support many-module situations like this
accept that this test is overly brutal, tone it down to be less nested modules (currently 6 layers, gonna try 5)

crystal gulch Feb 17, 2025, 9:47 PM

#

crystal gulch i also love that we broke it almost immediately after merging while i've been sl...

also fyi erik you should definitely be labelling your big perf improvement PRs as "benchmark" in gh lol, i know you reviewed the benchmark PR but i don't see any of the labels XD

crystal gulch Feb 18, 2025, 6:56 PM

#

5 also ooms locally for me, 16g of memory, but 4 works

#

i think it's also visibly slower? tryna put numbers to that now

gilded prawn Feb 18, 2025, 7:03 PM

#

That makes perfect sense in retrospect, I guess there's a pretty obvious fix by just having the function calls cached-per-session but ensuring the secret transfer always runs. It should be easy to add support for that to dagql via some optional callback that just runs once per-client after the actual function call.

The only question is how to do that in a non-kludgy way but worst case a little kludge for now would be fine to get a fix in. I suspect dagql is going to undergo lots of renovation in the near future anyways as part of the caching stuff so can figure out a neater way to incorporate it at that time if needed

#

@crystal gulch can you open an issue for this super quick (doesn't have to be that descriptive), just so if it doesn't make it in v0.16.0 we don't forget about it

crystal gulch Feb 18, 2025, 7:06 PM

#

yep will do

#

shoulda just done that yesterday but i jumped the gun a little bit and posted while my bisect was still running lol

#

and indeed it is significantly slower with layers=4: 2m28s before and 3m39s after... possible there's some cache effect on the inner engine fucking that up

#

https://github.com/dagger/dagger/issues/9619

GitHub

🐞 Memory consumption with large module trees is out of control, fai...

What is the issue? Shortly after we extracted BenchmarkModule to only run on chron-main and labels, BenchmarkModule started failing to cancellation. on further investigation, it turns out engines a...

silk mural Feb 18, 2025, 9:01 PM

#

gilded prawn That makes perfect sense in retrospect, I guess there's a pretty obvious fix by ...

Sounds reasonable, we do the same thing at the buildkit caching level

#

I think that sounds familiar, I spent a long time on that at some point 😛

gilded prawn Feb 19, 2025, 2:21 AM

#

This fix passes secret tests and lets me run BenchmarkLotsOfDeps locally successfully: https://github.com/dagger/dagger/pull/9621

Really ridiculously kludgy atm, gonna see if it has any impact on our test runtimes generally (which would indicate a possible "real-world" improvement outside that benchmark test, which is fairly extreme). If it does I'll try to sneak it into the release, otherwise will let it wait for v0.16.1 to avoid unneeded boat rocking

GitHub

POC fix for session-wide caching of function calls by sipsma · Pull...

Should fix #9619, can run BenchmarkLotsOfDeps locally now without excessive memory usage.
Extremely kludgy and ugly atm, would ideally like to find something a little more elegant but if not will c...

#

Hm the benchmark tests won't even run because of incompatible engine version. I guess the benchmark yamls aren't part of the autogen'd GHA stuff so they didn't pick up this fix: https://github.com/dagger/dagger/pull/9505/commits/8f819110f4ee6cace1296e72d49d3d6d31c72408

#

possible I just got a lucky run since they're noisy but the first run did go between 30-90s faster for some of the module-related test workflows...

pr https://github.com/dagger/dagger/actions/runs/13404181401?pr=9621
latest main (with my module caching fixes that also seem to have shaved some test runtimes) https://github.com/dagger/dagger/actions/runs/13402335855

I'll see what I can do, but I guess worst case v0.16.1 can get another performance bump

gilded prawn Feb 19, 2025, 3:34 AM

#

cleaned it up a bit, still not aesthetically pleasing at all but probably at "good enough" stage, will update the description after dinner

gilded prawn Feb 19, 2025, 3:36 AM

#

crystal gulch and indeed it is significantly slower with layers=4: 2m28s before and 3m39s afte...

I'm getting 67s on empty-cache dev engine and 15s on a re-run now (hard to compare across machines obviously though)

gilded prawn Feb 19, 2025, 4:49 AM

#

gilded prawn cleaned it up a bit, still not aesthetically pleasing at all but probably at "go...

Took it out of draft and updated with a description of what was going on and how the fix works

#

Also, on my second push to the PR I got a run of testdev-modules at 8m40s, which is lowest I think I've seen https://github.com/dagger/dagger/actions/runs/13404861585?pr=9621

#

won't really know until we've done the release and collected test run data for at least a few days, but 🤞

crystal gulch Feb 19, 2025, 5:33 AM

#

gilded prawn I'm getting 67s on empty-cache dev engine and 15s on a re-run now (hard to compa...

Is that with 4 layers or the default 6?

#

Because if that’s with 6 it’s quite good

#

QUITE

crystal gulch Feb 19, 2025, 5:35 AM

#

gilded prawn Hm the benchmark tests won't even run because of incompatible engine version. I ...

Weird, I have not hit this, but will fix on the PR that sends notifications for them

crystal gulch Feb 19, 2025, 5:36 AM

#

gilded prawn This fix passes secret tests and lets me run BenchmarkLotsOfDeps locally success...

Also fwiw it is extreme enough that it doesn’t matter to users, but it is mildly representative of our engine build, so possibly real meaningful gains there

gilded prawn Feb 19, 2025, 6:32 PM

#

crystal gulch Is that with 4 layers or the default 6?

oh I missed that you had lowered layers to 4, yeah I was running w/ 6

crystal gulch Feb 19, 2025, 6:33 PM

#

figured, that's a huge speedup even from before the per-client caching

crystal gulch Feb 20, 2025, 11:57 PM

#

gilded prawn I'm getting 67s on empty-cache dev engine and 15s on a re-run now (hard to compa...

squint damn what are you feeding your machine, freshly rebased 16.1 runner CI run duration is looking similar to mine still?https://v3.dagger.cloud/dagger/traces/02861b7cad7d6dd6d5af76d48dd3c52b?listen=a962026784ecb0a5&listen=a37c01a7ceae9ea4&listen=4c2d1eea9dee8b8f&listen=fc741e342281dda6 timing out and showing cancelled too, which is mysterious

Dagger Cloud

Browse and visualize Dagger traces.

gilded prawn Feb 20, 2025, 11:59 PM

#

crystal gulch <:squint:873986605990441010> damn what are you feeding your machine, freshly reb...

I reverted the fix in 16.1 because main sdk dev mysteriously started erroring out on something related to it. I have a PR out for an improved fix w/ more details https://github.com/dagger/dagger/pull/9653

GitHub

make function calls cached session-wide by sipsma · Pull Request #9...

Giving #9621 another try. I ended up reverting it for v0.16.1 because, for some reason, sdk dev tests on main started consistently failing due to a problem with it seemingly out of nowhere. The pro...

crystal gulch Feb 20, 2025, 11:59 PM

#

ah cool i just didn't see the revert

#

bc it's hiding in the prep commit 🙂

gilded prawn Feb 21, 2025, 12:01 AM

#

crystal gulch bc it's hiding in the prep commit 🙂

Yeah I realized it in the middle of that pr and really didn't want to ping people again for reviews at 8pm PST lol

crystal gulch Feb 21, 2025, 12:02 AM

#

makes sense and no worries, i have to run off to japan now anyways! see you in a week, im already excited to see the metabase graphs with that last lil speedup 🙂

gilded prawn Feb 21, 2025, 12:03 AM

#

crystal gulch makes sense and no worries, i have to run off to japan now anyways! see you in a...

Nice yes catching up on that is in my queue today! Have a good PTO in Japan! Be careful of the raw chicken 😄

crystal gulch Feb 21, 2025, 12:03 AM

#

i'd rather be reckless and suffer the consequences 😈

#Cache function calls per-client to avoid...