#Lua inlining discussion

1 messages · Page 1 of 1 (latest)

gleaming hazel
#

What I'm envisioning is:

stack ... | locals for outer func | locals for inlined func | args for inlined func
|
v
stack ... | locals for outer func | locals for inlined func | results from inlined func

Promote the locals from the inlined to become part of the block of locals for the outer function, (so they're effectively upvalues) and have the lifetime of the outer function. Then the call stack can behave like a normal function call because the inlined function's locals aren't in the way.

jovial holly
#

That does make sense but it's unfortunately not the source of the problem. I even thought about moving all the locals of the inlined function as well as the arguments for the inlined function to the absolute front/bottom of the stack such that all the registers used for execution would always be at the top.

Even then, however, we have the problem that in the inlined function there are 2 return statements with differing amounts of return values. That means the CALL to bar would have to call using 1 argument in one case, 2 arguments in the other. Usually this is done through top, however LOADK - or the vast majority of the instructions in general - do not set top from what I can see.

The reason why duplicating the call to bar would solve this issue is because then the amount of arguments for those 2 calls would be known at compile time. I'm already not supporting inlining functions that return up-to-top in this case, so it would work. (there are some ides to support this, but it's beyond what the current infrastructure is capable of)

gleaming hazel
#

Okay, I think I see the issue now. My confusion was that Lua pre-allocates stack space for temporary values through static analysis, so the top of stack does not vary at all in the middle of a function's execution, unlike some other stack-based runtimes.

jovial holly
#

yea there are technically 2 different kinds of top. The function's top, and the "current" top where the latter is only set and used for instructions interacting with vararg. But it's not limited to vararg so the more general way of saying it was "up-to-top"

#

(for correct terms, it's the CallInfo's top and the lua_State's top)

orchid wigeon
#

It didn't seem to me like you were describing inlining in your initial post. Inlining is done fully at compile time and doesn't involve any calls to the inlined function. It's an operation on AST.
But it doesn't seem like there's any way to set top other than CALL, which is exactly what you are trying to avoid. You'd have to transform AST into a representable state.

jovial holly
#

correct, inlining usually is purely done at compile time. I mean it has to be. The problem with Lua is that calls can take and return variable amounts of values. Take local foo = {bar(...)} for example. So ultimately the problem was that a function can return different amounts of values in a context where regular Lua would call the function with "var results" in the CALL instruction

gleaming hazel
#

So when trying to translate a RETURN into the "equivalent" inline instructions, there aren't any equivalents to be used.

orchid wigeon
#

I get that.

orchid wigeon
#

Just noticed that for some reason I had 5.1 bookmarked. 5.2 doesn't have anything new on top though.

jovial holly
#

Sorry was doing other things - I've looked at this but I'm not sure what you're trying to say. I've also been looking at source for this several times including over an hour today. I got a fairly good grasp on how it works, but if there's something I'm missing let me know

orchid wigeon
#

I'm saying that Therax is trying to explain to me what I already understand. Sorry, wasn't aimed at you.

jovial holly
#

oh I see

gleaming hazel
dim vault
#

i usually call this thing "the vararg tuple" since that's really the most recognizable use, but yeah it'd the bastard-child of the lua type system, no real direct manipulation is possible afaik

jovial holly
#

mhm

dim vault
#

🤔 you might can abuse select to do something useful

dim vault
#

@jovial holly if you're at the point of potentially inlining a function though you must know definitively that it is indeed a particular function so you could just limit it to non-varargs functions and only calls that return fixed numbers of returns (so, no {foo()} or foo(bar()) but yes local foo,bar = foo())

#

and just say that using the varargs tuple on either end makes it ineligible for inlining

#

hmm, or maybe using one internally too 🤔

#

but again, if you know which function it is that's a relatively easy test

jovial holly
#

That was my thought too, but not supporting foo(bar()) just feels bad, It's too common (even if it's not super common, still common imo)
So instead of not supporting any vararg tuples I settled on only supporting ones where only one side both when "entering" and "exiting" the function uses a vararg tupple.
That's possible because then the amount of args and amount of return values are known, the only special case was, well, the case I asked about

dim vault
#

sure if you can make more special case work absolutely

#

but the point was just don't try too hard for the hard ones

jovial holly
#

oh yea definitely not

dim vault
#

because inlining isn't really that killer of an optimization, i'ts more of a million tiny cases

#

so you'd get most of teh benefit of it by only doing the easy ones

#

and tbh, i'm not sure that foo(bar()) is that common of a pattern

#

specifically with bar results being spread by being end-of-list

#

as opposed to foo((bar())) with only one result passed, which you could do just fine

#

or foo(bar(),x)

jovial holly
#

I'll quickly search for how many times I do it in phobos. (Searching all mods would take like 2 hours because I broke that setup, rip)

#

181/3845, so 4.7%

#

of all call nodes

#

That's the check for it, I don't think I'm missing anything

local last_expr = node.args[#node.args]
if last_expr and last_expr.node_type == "call" and not last_expr.force_single_result then
  -- ...
end

(sorry for not putting all that in 1 message. Me tired)

#

Now how many of those are c function calls I don't know, but all Lua functions would be known at compile time. That doesn't mean it's going to inline all of those, but I think you get the idea

dim vault
#

yeah

#

but i think as you combine those slices you'll end up with a pretty thin set that you'd actually even want to inline, so probably not enough to be worth it

#

80/20 and such

jovial holly
#

you're probably right

dim vault
#

but of course, if you did find some easy ones, by all means 🤷

#

but i suspect htat the functions most people would say they want inlined will be fixed-count on both ends and not much in between

#

"this one expression i didnt' want ot write a million times" the function

jovial holly
#

I'll make sure to look at the real world cases (by analyzing all mods, probably) before deciding which cases are worth it and aren't. Like there is this case which is definitely not worth it

dim vault
#

also fwiw i feel like i've seen notes about other compilers (gcc i think?) having lots of struggles with deciding when to inline things because it's a classic hard-to-predict-returns optimization

#

beyond the blatently obvious cases like the functions i just described above anyway

jovial holly
#

Yea I'm not sure about that logic yet at all. It might just be determined by size (since I'll do it in IL that'll be an easy factor). But predicting that inlining the function call would allow for several other optimizations to do something sounds hard

dim vault
#

i mean just predicting whether it produces savings or not and by how much

#

but yeah size is a good first order test, at least i think so 🤷

jovial holly
#

:D we will see

dim vault
#

but then there's cases where you have a big function that's used only once, where it may actually make sense despite size

#

that's the other one i might consider an "obvious" case

jovial holly
#

true that is an obvious case. Well Lua has some weird visibility, but the compiler should be able to figure that out if it knows the entire workspace. That's kind of an unrelated problem actually

dim vault
#

yeah it was mostly just a counter example to size being a good sole indicator :p

jovial holly
#

understandable, and it proofed the point

dim vault
#

have you done any timing to see approximately how it compares in some example cases?

jovial holly
#

nope

dim vault
#

might be worth hacking up some by hand to see what it looks like

jovial holly
#

I plan on setting up performance testing once I have IL in an actually usable state, because then I can (in theory) quickly modify or create random code and see how it performs

dim vault
#

would be very informative to developing rules for when to attempt inlining i think

jovial holly
#

yea for sure

#

It'll also be interesting in general, because right now all I know is "tiny callback functions are bad for performance" but how bad are they really? My knowledge is similar for several other things

dim vault
#

i suspect you may find that many "x are bad" things are not so bad actually but become notable in some specific case that everyone forgot the details of

jovial holly
#

probably yea

dim vault
#

"y are good" rules seem to be slightly better and "y is better than x" much better :p

jovial holly
#

can't do anything if all you know what is bad except tell the programmer "hey your code is bad"

dim vault
#

exactly

#

but also i meant more like the "folk wisdom" perf rules, very often a "y is better than x" measurement then gets quoted forever as "x is bad" even if it's only valid in a narrow context (but i can't think of a good example off hand...)

#

as opposed to, like, the general guidance we often pass around about "local all the things" almost always actually expanding to the explanation that locals are faster than upvals are faster than globals

#

but the "good" versions of such rules seem to stick to their reasons longer 🤷

jovial holly
#

oh yea that makes sense. (also upvals being slower than locals makes me sad, I love upvals. But again, need some real world numbers)

dim vault
#

they're only slightly slower iirc

#

but i have no real numbers handy

#

becuase they're both indexing array-ish things iirc it's just slightly more layers of pointer chasing to get to the upval one

#

if i remember the object chain correctly it's becuase it has to access them through the closure rather than right off the stack

jovial holly
#

and some instructions requiring temporaries in order to use upvalues, like any of the math or logical ops iirc

dim vault
#

ohyes and that