Least squares: why take partial derivatives of the slope (B0) and y-intercept (B1) to minimize? | Mathematics | Page 1

void cove Oct 3, 2023, 12:40 AM

#

Taking the partial derivatives of the slope (B0) and y-intercept (B1), setting the result equal to 0, and solving for B0 and B1 gives us the values of B1 and B0 that minimize the sum of squared residuals. Why is this a valid assumption? In ordinary least squares regression, we aim to find the values of ( B_0 ) and ( B_1 ) that minimize the sum of squared residuals. To do this, we typically take the partial derivatives of the sum of squared residuals with respect to ( B_0 ) and ( B_1 ), set these derivatives to zero, and solve the resulting equations to find the optimal ( B_0 ) and ( B_1 ). But why? If we're only taking the partial derivative, couldn't we accidentally optimize a B0 (e.g.) that doesn't "play well" with B1? Like, for that B0, B1 is not optimal. I would expect that we'd have to consider both B1 and B0 at the same time in order to correctly optimize.

proper questBOT Oct 3, 2023, 12:40 AM

#

void cove Taking the partial derivatives of the slope (B0) and y-intercept (B1), setting t...

#

Wait patiently for a helper to come along.
Once someone helps you, say thank you and close the thread with:

+close

Feel free to nominate the person for helper of the week in #helper-nominations
Do not ping the mods, unless someone is breaking the rules.
If you're happy with the help you got here, and the server overall, you can contribute financially as well:

hazy gullBOT Oct 3, 2023, 12:40 AM

#

Smonk

void cove Oct 3, 2023, 12:41 AM

#

I expect the answer has something to do with a linear combination of variables like in a linear regression is a convex function and these types of interactions will not exist

#

Like, clearly there is only one solution. It's convex, a quadratic, and it's also just obvious. But I don't see how the method would actually ensure that we are getting this one solution

#

I guess there's not a way to create an optimal B0 which allow for an optimal B1

#

Imagine that a valley leads into a deeper valley. The function is still monotonic everywhere, but you'd have two solutions for a partial derivative: the flat bottom of the first valley, and the flat bottom of the valley that it leads into

#

There is one global minima in that case but two solutions for the partial derivative

#

Screen_Shot_2023-10-02_at_7.52.25_PM.png

#

Screen_Shot_2023-10-02_at_7.52.35_PM.png

sleek fiber Oct 3, 2023, 1:51 AM

#

If $f:U\subseteq\mathbb{R}^n\to \mathbb{R}$ and if $a\in U$ is a relative extremum, then $(\nabla f)(a)=0$

#

so if a is a max/min for f, then f's gradient at a is 0

#

you then check the Hessian to see the nature

hazy gullBOT Oct 3, 2023, 1:54 AM

#

Omegabet_

sleek fiber Oct 3, 2023, 1:54 AM

#

void cove Taking the partial derivatives of the slope (B0) and y-intercept (B1), setting t...

The converse need not be true, for example saddle points give 0 gradient, hence why you check the Hessian blahblahblah

#

you set the gradient to be 0 to find the possible places for extrema, then check them with 2nd derivative test/the analog in multivar

void cove Oct 3, 2023, 1:56 AM

#

hazy gull **Omegabet\_**

I don't know what U is. I do know what the gradient of f is but not well. lol.

sleek fiber Oct 3, 2023, 1:56 AM

#

U is the domain of f

#

usually open but you can just take the interior if it's not open blobshrug

void cove Oct 3, 2023, 1:56 AM

#

sleek fiber you set the gradient to be 0 to find the *possible* places for extrema, then che...

right but for the case of a linear regression, we have a sum of terms y = x1a + x2b ... so I don't think you even need to check

#

It should always be convex

sleek fiber Oct 3, 2023, 1:57 AM

#

just seeing an equation doesnt mean anything about the structure of the graph

#

but you asked what the gradient being 0 had to do with it, so that's what it has to do with it

#

gives you the candidates for extrema

void cove Oct 3, 2023, 1:58 AM

#

yeah but it's known that linear sums are convex like for linear regression. so assuming/given this, we wouldn't have to check the Hessian / do anything with gradients, right?

sleek fiber Oct 3, 2023, 1:59 AM

#

there might be shortcuts, I've only ever done least squares via purely LinAlg notions w/ the normal equations

void cove Oct 3, 2023, 1:59 AM

#

Any function 𝑓(𝒙)=𝒂T𝒙+𝑏 is convex, where 𝒂, 𝒙∈ℛ^n, 𝑏∈ℛ

sleek fiber Oct 3, 2023, 1:59 AM

#

but you're asking about optimization in multivar calc at the end of it, so that's how you do optimization in multivar

void cove Oct 3, 2023, 2:01 AM

#

Proving the Convexity of Affine Functions ( f(\mathbf{x}) = \mathbf{a}^T \mathbf{x} + b )

Definitions:

Affine Function: Essentially a multi-dimensional linear function.
Convex Function: A function with a "bowl-like" shape, having no local minima or maxima except the global one. In graphical terms, the function surface will always lie below the line segment connecting any two points on it.

Approach:

To prove that ( f(\mathbf{x}) = \mathbf{a}^T \mathbf{x} + b ) is convex, we use the following definition of convexity: for any two points ( \mathbf{x}_1 ) and ( \mathbf{x}_2 ), and any ( \lambda ) between 0 and 1,

[
f(\lambda \mathbf{x}_1 + (1-\lambda) \mathbf{x}_2) \leq \lambda f(\mathbf{x}_1) + (1-\lambda) f(\mathbf{x}_2)
]

hazy gullBOT Oct 3, 2023, 2:01 AM

#

Smonk
Compile Error! Click the errors reaction for more information.
(You may edit your message to recompile.)

void cove Oct 3, 2023, 2:01 AM

#

Argument:

Consider two points ( \mathbf{x}_1 ) and ( \mathbf{x}_2 ), and let ( \lambda ) be a weighting factor that varies between 0 and 1. The point ( \lambda \mathbf{x}_1 + (1-\lambda) \mathbf{x}_2 ) can be thought of as a weighted average and will lie on the line connecting ( \mathbf{x}_1 ) and ( \mathbf{x}_2 ).
Evaluate ( f(\lambda \mathbf{x}_1 + (1-\lambda) \mathbf{x}_2) ), which represents the value of the function at this weighted sum point.
Also evaluate ( \lambda f(\mathbf{x}_1) + (1-\lambda) f(\mathbf{x}_2) ), which is the value the function would take if it were flat between ( \mathbf{x}_1 ) and ( \mathbf{x}_2 ).

Special Cases:

Flat Function: If the function is flat, then both expressions will be equal. This is a special case when the function is not only convex but also affine.
Bowl-shaped Function: If the function is convex, the value of ( f(\lambda \mathbf{x}_1 + (1-\lambda) \mathbf{x}_2) ) will be less than ( \lambda f(\mathbf{x}_1) + (1-\lambda) f(\mathbf{x}_2) ).
Function with Local Extrema: If the function has local minima or maxima, then it is not guaranteed to be convex.

Algebraic Verification:

The function ( f(\mathbf{x}) = \mathbf{a}^T \mathbf{x} + b ) can be verified to be convex algebraically by substituting into the convexity definition:

[
\begin{aligned}
f(\lambda \mathbf{x}_1 + (1-\lambda) \mathbf{x}_2) &= \mathbf{a}^T (\lambda \mathbf{x}_1 + (1-\lambda) \mathbf{x}_2) + b \
&= \lambda \mathbf{a}^T \mathbf{x}_1 + (1-\lambda) \mathbf{a}^T \mathbf{x}_2 + b \
&= \lambda f(\mathbf{x}_1) + (1-\lambda) f(\mathbf{x}_2)
\end{aligned}
]

Thus, we find that ( f(\lambda \mathbf{x}_1 + (1-\lambda) \mathbf{x}_2) = \lambda f(\mathbf{x}_1) + (1-\lambda) f(\mathbf{x}_2) ) for all ( \lambda ) between 0 and 1, and for all possible ( \mathbf{x}_1 ) and ( \mathbf{x}_2 ).

#

lol

hazy gullBOT Oct 3, 2023, 2:01 AM

#

Smonk
Compile Error! Click the errors reaction for more information.
(You may edit your message to recompile.)

sleek fiber Oct 3, 2023, 2:01 AM

#

ok

#

and?

void cove Oct 3, 2023, 2:20 AM

#

so no need to check hessian or gradient

sleek fiber Oct 3, 2023, 2:20 AM

#

then your question is answered ig

#

want me to close the post or shall you?

void cove Oct 3, 2023, 2:21 AM

#

im just trying to learn lol

sleek fiber Oct 3, 2023, 2:21 AM

#

well it looks like you've learned then

void cove Oct 3, 2023, 2:22 AM

#

I know for a fact we can accomplish it with just taking partial derivatives. I don't know why this is the case

sleek fiber Oct 3, 2023, 2:22 AM

#

cause that probably shows for such functions, extrema are already maxima

#

idk, I didnt read it

#

I dont read blunderbusses of copy paste blobshrug

#

hence finding extrema is equivalent to finding maxima.

#

or if $\Gamma_f$ is a convex set, then $a\in U$ such that $(\nabla f)(a)=0$ means $a$ isnt a saddle point

hazy gullBOT Oct 3, 2023, 2:24 AM

#

Omegabet_

sleek fiber Oct 3, 2023, 2:24 AM

#

same difference

pliant ridge Oct 3, 2023, 2:43 AM

#

Convex functions don't have saddle points

void cove Oct 3, 2023, 3:00 AM

#

pliant ridge Convex functions don't have saddle points

exactly

void cove Oct 3, 2023, 3:15 AM

#

Can convex functions have a region where x1 temporarily does not change in height (z) for some distance along x1, this axis where x1 does not change in height is the minimum value of x2 in that region? Assume the region is not the global minima

void cove Oct 3, 2023, 3:15 AM

#

sleek fiber I dont read blunderbusses of copy paste <:blobshrug:397542126323499009>

I typed that out myself weeks ago without looking anything up or copy-pasting. You need to chill

void cove Oct 3, 2023, 3:16 AM

#

sleek fiber or if $\Gamma_f$ is a convex set, then $a\in U$ such that $(\nabla f)(a)=0$ mean...

idk what gamma is but yeah finding extrema is the same as finding maxima

pliant ridge Oct 3, 2023, 4:39 AM

#

void cove Can convex functions have a region where x1 temporarily does not change in heigh...

Yeah they can have that. They can be constant over a line segment or even a multidimensional region as long as that region is a convex set and everything around it is higher

void cove Oct 3, 2023, 5:11 AM

#

And that can be true even if that area isn't the local maxima? Like, that valley can eventually drop into a larger pit/valley?

#

Because if that's true, then taking a partial derivative wrt one variable might get me two solutions if I'm interpreting a partial derivative correctly

pliant ridge Oct 3, 2023, 10:25 PM

#

It has to be a local minima, no?

#

and a global one too

#

There is one point on a convex function where the gradient is 0 and that is the global minimum

#

There is no global minimum iff there is no such point

void cove Oct 4, 2023, 9:22 PM

#

pliant ridge There is one point on a convex function where the gradient is 0 and that is the ...

wrt to minimum f(x1,x2) yes

#

But it that true when considering f(x1) alone?

#

Because that's what it seems that the partial derivs are doing

pliant ridge Oct 4, 2023, 9:23 PM

#

?

sleek fiber Oct 4, 2023, 10:56 PM

#

f is a function of 2 variables, f(x_1) is nonsense

void cove Oct 10, 2023, 4:11 AM

#

sleek fiber f is a function of 2 variables, f(x_1) is nonsense

ok bro

#

I'm illustraing what a partial deriv is doing

#

Taking a partial deriv considers change in the function wrt a single var, e.g. x1

#Least squares: why take partial derivatives of the slope (B0) and y-intercept (B1) to minimize?