#Cache function calls per-client to avoid...
1 messages ยท Page 1 of 1 (latest)
fwiw, this benchmark is the most intentionally brutal one with a load of modules calling other modules in a quadratic fanout.
i also love that we broke it almost immediately after merging while i've been slowly baking discord notifications for failure in the background XD
confirmed the oomkill with a passing run on f920f96f and a failing one on 3a28d13c
cc @silk mural too here, since i think there are sacrifices here to the make-secrets-work gods and i know we both reviewed that PR... there are a couple ways to proceed:
- dig on this and find another less memory intensive way of caching across clients to support many-module situations like this
- accept that this test is overly brutal, tone it down to be less nested modules (currently 6 layers, gonna try 5)
also fyi erik you should definitely be labelling your big perf improvement PRs as "benchmark" in gh lol, i know you reviewed the benchmark PR but i don't see any of the labels XD
5 also ooms locally for me, 16g of memory, but 4 works
i think it's also visibly slower? tryna put numbers to that now
That makes perfect sense in retrospect, I guess there's a pretty obvious fix by just having the function calls cached-per-session but ensuring the secret transfer always runs. It should be easy to add support for that to dagql via some optional callback that just runs once per-client after the actual function call.
The only question is how to do that in a non-kludgy way but worst case a little kludge for now would be fine to get a fix in. I suspect dagql is going to undergo lots of renovation in the near future anyways as part of the caching stuff so can figure out a neater way to incorporate it at that time if needed
@crystal gulch can you open an issue for this super quick (doesn't have to be that descriptive), just so if it doesn't make it in v0.16.0 we don't forget about it
yep will do
shoulda just done that yesterday but i jumped the gun a little bit and posted while my bisect was still running lol
and indeed it is significantly slower with layers=4: 2m28s before and 3m39s after... possible there's some cache effect on the inner engine fucking that up
Sounds reasonable, we do the same thing at the buildkit caching level
I think that sounds familiar, I spent a long time on that at some point ๐
This fix passes secret tests and lets me run BenchmarkLotsOfDeps locally successfully: https://github.com/dagger/dagger/pull/9621
Really ridiculously kludgy atm, gonna see if it has any impact on our test runtimes generally (which would indicate a possible "real-world" improvement outside that benchmark test, which is fairly extreme). If it does I'll try to sneak it into the release, otherwise will let it wait for v0.16.1 to avoid unneeded boat rocking
Hm the benchmark tests won't even run because of incompatible engine version. I guess the benchmark yamls aren't part of the autogen'd GHA stuff so they didn't pick up this fix: https://github.com/dagger/dagger/pull/9505/commits/8f819110f4ee6cace1296e72d49d3d6d31c72408
possible I just got a lucky run since they're noisy but the first run did go between 30-90s faster for some of the module-related test workflows...
- pr https://github.com/dagger/dagger/actions/runs/13404181401?pr=9621
- latest main (with my module caching fixes that also seem to have shaved some test runtimes) https://github.com/dagger/dagger/actions/runs/13402335855
I'll see what I can do, but I guess worst case v0.16.1 can get another performance bump
cleaned it up a bit, still not aesthetically pleasing at all but probably at "good enough" stage, will update the description after dinner
I'm getting 67s on empty-cache dev engine and 15s on a re-run now (hard to compare across machines obviously though)
Took it out of draft and updated with a description of what was going on and how the fix works
Also, on my second push to the PR I got a run of testdev-modules at 8m40s, which is lowest I think I've seen https://github.com/dagger/dagger/actions/runs/13404861585?pr=9621
won't really know until we've done the release and collected test run data for at least a few days, but ๐ค
Is that with 4 layers or the default 6?
Because if thatโs with 6 itโs quite good
QUITE
Weird, I have not hit this, but will fix on the PR that sends notifications for them
Also fwiw it is extreme enough that it doesnโt matter to users, but it is mildly representative of our engine build, so possibly real meaningful gains there
oh I missed that you had lowered layers to 4, yeah I was running w/ 6
figured, that's a huge speedup even from before the per-client caching
damn what are you feeding your machine, freshly rebased 16.1 runner CI run duration is looking similar to mine still?https://v3.dagger.cloud/dagger/traces/02861b7cad7d6dd6d5af76d48dd3c52b?listen=a962026784ecb0a5&listen=a37c01a7ceae9ea4&listen=4c2d1eea9dee8b8f&listen=fc741e342281dda6 timing out and showing cancelled too, which is mysterious
I reverted the fix in 16.1 because main sdk dev mysteriously started erroring out on something related to it. I have a PR out for an improved fix w/ more details https://github.com/dagger/dagger/pull/9653
Yeah I realized it in the middle of that pr and really didn't want to ping people again for reviews at 8pm PST lol
makes sense and no worries, i have to run off to japan now anyways! see you in a week, im already excited to see the metabase graphs with that last lil speedup ๐
Nice yes catching up on that is in my queue today! Have a good PTO in Japan! Be careful of the raw chicken ๐
i'd rather be reckless and suffer the consequences ๐