#Help with MLE Estimators
635 messages · Page 1 of 1 (latest)
- Ask your question and show the work you've done so far. If you've posted a screenshot of a question, specify which part you need help with.
- Wait patiently for a helper to come along.
- Once someone helps you, say thank you and close the thread with:
+close - Feel free to nominate the person for helper of the week in #helper-nominations
- Do not ping the mods, unless someone is breaking the rules.
- If you're happy with the help you got here, and the server overall, you can contribute financially as well:
I don't understand what the estimator of MLE in this case, is this just finding the common MLE for sigma? but what is r in this case?
Here, you have two random variables V and X
You suppose that there is a nearly linear relationship between them
So there is a slope r such that V = r X + epsilon, where epsilon ~ N(0, sigma²)
So here usually it is easier to provide an estimation for the slope rather than sigma²
This is just a linear regression case
You could think that conditionally to X, V follows a normal distribution N(rX, sigma²)
Hence the conditional log likelyhood of V is given by $\ln f_{V|X,r, \sigma^2}(v|x, r, \sigma^2) = \sum_{i=1}^n \frac{(v-rx_i)^2}{\sigma^2} - \frac{n}{2}(\ln \sigma^2 + \ln 2\pi)$
Rion
Where x denotes the vector of all observations (x1, ..., xn), which are independent
So now you have an optimization problem
With 2 variables
How do you solve optimization problems with 2 variables?
I honestly don't know, this course was done very very very rushed and i kinda lost the thread
I just want to know how to do the things in that exam and thats it ig
Compute the gradient?
And the critical points
solve the partial derivatives from the log likely function?
Well you equate both to zero
You see that the function there is a function of r and sigma²
(instead of sigma² I will use v)
okay
So say $g(r, v)$
Rion
okay
You compute $\nabla g (r, v)$ and equate it to 0
Rion
In other words, it's a system of equations, which are
$$\frac{\partial g}{\partial r} (r,v) = 0$$
$$\frac{\partial g}{\partial v} (r,v) = 0$$
Rion
The first would be something like Summation 2xi (v-rxi) / sigma^2
The second would be something like Summation -2(v- rxi)/sigma^3 + n/sigma?
The second one is wrong
almost correct
also since the computations are fairly complex, if you want me to check I ask you to write with tex
i dont know how to work the bot, let me see if i can figure it out
it's fairly intuitive
take example from my texts
$\sum_{i=1}^{n} i = \frac{n(n+1)}{2}$
Rion
When you enclose your math text between dollar signs, tex will render and display your message
And the round d is given by the command \partial
$\frac{\partial g}{\partial r}$
Rion
$\sum_{i=1}^n \frac{2xi(v-rxi)}{sigma^2}$
Jacques
i didnt know how to do sigma
Yeah, don't forget the subscripts
and for greek letters, add a backslash
$\sigma$
Rion
gotcha
$x_i$ too
Rion
$\sum_{i=1}^n \frac{2x_i(v-rx_i)}{\sigma^2}$
Jacques
Also, I apologize
I shouldn't have used v for the variance, it's a symbol already in use
I wanted to replace $\sigma^2$ with $v$ but $v$ is already in use
Rion
yeah, got that, used t in my head
Let's say instead that it's $\nu$
Rion
i think i found my mistake
let me see if i can write it out
second
$\sum_{i=1}^n \frac{-(v-rx_i)^2}{\sigma^4} + frac{-n}{2\sigma^2}$
Jacques
oh shit i fucked the frac up
$\sum_{i=1}^n \frac{-(v-rx_i)^2}{\sigma^4} + \frac{-n}{2\sigma^2}$
Jacques
there
$g(r, \nu) = \sum_{i=1}^{n} \frac{(v_i - r x_i)^2}{\nu} - \frac{n}{2} (\ln(\nu) + \ln(2\pi))$
I think
Rion
is my solution not correct?
I just rewrote the original function to optimize for you
the fraction is not working idk why
What's this supposed to be?
you forgot the backslash
before frac
Yeah, that's good, except for the fact that it's v_i instead of v
What about wrt r?
this, no?
Same remark
$\sum_{i=1}^n \frac{2x_i(v-rx_i)}{\sigma^2}$
Jacques
Also you're missing a -
$\sum_{i=1}^n \frac{-2x_i(v-rx_i)}{\sigma^2}$
Jacques
$\sum_{i=1}^n \frac{2x_i(v_i-rx_i)}{\sigma^2}$
Jacques
Yeah, now the minus
$\sum_{i=1}^n \frac{-2x_i(v_i-rx_i)}{\sigma^2}$
Jacques
$\sum_{i=1}^n \frac{-(v_i-rx_i)^2}{\sigma^4} + \frac{-n}{2\sigma^2}$
Jacques
To recap what you said:
$$\frac{\partial g}{\partial r} (r, \nu) = -\sum_{i=1}^n \frac{2x_i(v_i-rx_i)}{\nu} = 0$$
$$\frac{\partial g}{\partial \nu} (r, \nu) = \sum_{i=1}^n \frac{-(v_i-rx_i)^2}{\nu^2} + \frac{-n}{2\nu} = 0$$
yes
Rion
There we go
Now, 2 unknowns, 2 equations
You need to solve this
hint: start by finding the optimum r from the first equation
it should be fairly ok
then sub in the 2nd
okay, give me a moment, i've never solved equations with summation before
our math courses were seriously lacking, i am finding
for the first one, intuitively $r = \frac{-x_i}{v_i}$
ah i have to write the second one out, second
Jacques
Not quite no
damn it
Don't lose hope
I intend to commit to helping you when I first answered this post
so it's fine
isn't that only 0 when $v_i - rx_i = 0$
Jacques
I will do it a demonstration for you with the first one
and you will try with the second one
okay thank you
Here we just split and distribute the sum
$$\frac{\partial g}{\partial r} (r, \nu) = -\sum_{i=1}^n \frac{2x_i(v_i-rx_i)}{\nu} = 0$$
Rion
Assuming that we look for solutions $(r, \nu)$ such that $\nu > 0$, then we can maintain equivalence by multiplying by $\nu$
Rion
$\Longleftrightarrow \sum_{i=1}^n x_i(v_i-rx_i)= 0$
makes total sense to me, yeah
Rion
Rion
then we split on both sides
$\Longleftrightarrow \sum_{i=1}^n x_i v_i - \sum_{i=1}^n rx_i^2= 0$
Rion
$\Longleftrightarrow \sum_{i=1}^n rx_i^2= \sum_{i=1}^n x_i v_i$
Rion
Now, see that the r doesn't depend on the sum index
we can factorize it
$\Longleftrightarrow r \sum_{i=1}^n x_i^2= \sum_{i=1}^n x_i v_i$
Rion
Finally:
$\Longleftrightarrow r = \left( \sum_{i=1}^n x_i^2 \right)^{-1}\sum_{i=1}^n x_i v_i$
Rion
For a critical point $(r, \nu)$ such that $\nabla g(r, \nu) = 0$, it necessarily holds that $r$ has the above expression
Rion
Conversely (since we only used $\Longleftrightarrow$), that expression of $r$ will provide $\frac{\partial g}{\partial r}(r, \nu) = 0$
Rion
Jacques
No
Here it is not clear what is i
so your expression does not make sense
here the i that I used belongs to the summation
$r = \frac{x_1 v_1 + ... + x_n v_n}{x_1^2 + ...x_n^2}$
Rion
It's a long expression yeah
But it's not the most complicated estimator of the slope
MLE estimator is fairly simple
in comparison to other ones
if there are any
Anyway, use my example to try to figure out the expression of $\nu$
Rion
Since now you can just substitute $r$
Rion
considering we covered Monte Carlo Simulation, Data Analysis, PCA and Multilinear Regression in 4 days, i am sure there are more we've missed in our lectures
sounds like data science to me
it is
or stats i guess
Then you gotta up your calc because monte carlo is going to hurt
pca is not too hard
Anyway, linear regression is one of the simplest models that can be demanded of you if you ever do a data scientist interview
so it'd be really beneficial for you to really understand that
Since I don't know if I will be here in a couple of hours, I will give you the answer so that you can check your MLE estimators
$\hat{r} = \left( \sum_{i=1}^n x_i^2 \right)^{-1}\sum_{i=1}^n x_i v_i$
Rion
$\hat{\nu} = \frac{1}{n}\sum_{i=1}^{n} (v_i - \hat{r} x_i)^2$
Rion
shouldn't there be a 2
i trust you with my entire life
Jacques
but i will take your word for it
So bear with me for just 5 minutes, we just found question 1 right
just to confirm (I am losing my mind)
Hmm
Well I will check again just to be sure
Ah, ok, I see the problem
We were asked to estimate $\sigma$, not $\sigma^2$
Rion
so the asnwer is everything you had/ sqrt?
Yes, but we also wrote the wrong equations
Though you can keep my answer for r, it's correct
oh fuck
Don't worry
it's a quick fix
$$\frac{\partial g}{\partial r} (r, \sigma) = -\sum_{i=1}^n \frac{2x_i(v_i-rx_i)}{\sigma^2} = 0$$
$$\frac{\partial g}{\partial \sigma} (r, \sigma) = \sum_{i=1}^n \frac{-2(v_i-rx_i)^2}{\sigma^3} - \frac{-n}{\sigma} = 0$$
oh my first solution was correct!
Rion
massive win for me, this is my first win in life
Yeah, my bad, I messed up because I thought it was the estimation of sigma²
Though it still doesn't solve the mystery of the hanging 2
when differentiating 1/sigma²
because the 2 just appeared in the sum instead
Hmmm
That's unlikely
Ah no I know why
It's because I'm a moron
maybe youshouldn't have used sigma square here?
Here
It's supposed to be, divided by 2 sigma²
so the 2 cancels out
again, it doesn't affect our reasoning with the finding of r
For that, I'll need a bit of backtracking
You know that an observation $V_i$ follows, conditionally to $X_i$, a normal distribution
Rion
Except all $X_i$ are iid in our hypothesis
Rion
yeah
So the distribution of $V = (V_1, ..., V_n)$ is
Rion
okay
question
when given to find sigma^2, shouldn't the MLE be the biased sample variance
just asking, thats what i read online
yes
indeed
Well, the thing is
Since you have a product
okay, so when asked to find sigma^2 in the Gaussian case, always the biased smaple vairance
what will be the result in our case then
The distribution of the $\epsilon_i$ will be both:
Rion
will it just be the biased sample variance squared root/
$f_{\epsilon}(v - r x) = \prod_{i=1}^{n} f_{\epsilon_{i}}(v_i - rx_i)$
Where $f_{\epsilon_i}$ is the cdf of a normal distribution of mean $0$ and variance $\sigma^2$
Rion
So that explains why when you take the log
Rion
$g(r, \sigma) = \sum_{i=1}^{n} \ln f_{\epsilon_i}(v_i - rx_i)$
Rion
$g(r, \sigma) = \sum_{i=1}^{n} \ln f_{\epsilon_i}(v_i - rx_i) = \sum_{i=1}^{n} \ln \left( \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( \frac{(v_i - r x_i)^2}{2\sigma^2}\right)\right)$
And by using the properties of the log
$g(r, \sigma) = \sum_{i=1}^{n} \left( - \frac{1}{2} \ln(2 \pi) - \ln \sigma + \frac{(v_i - rx_i)^2}{2\sigma^2}\right)$
Rion
Rion
So here, if you take out the terms and group them
you end up with
$g(r, \sigma) = - \frac{n}{2} \ln(2\pi) - n \ln \sigma + \sum_{i=1}^{n} \frac{(v_i - r x_i)^2}{2\sigma^2}$
Rion
So this is the function to optimize
called the log-likelihood
since it's literally the log of the likelihood
now to get sigma, is is just the biased sample variacne sqrt?
or is it actually different
I think it's the same thing, let me check
$\frac{\partial g}{\partial \sigma} (r, \sigma) = 0 = -\frac{n}{\sigma} + \sum_{i=1}^{n} \frac{-2 (v_i - rx_i)^2}{2 \sigma^3}$
Rion
thats okay dw
rion can i ask you something else real quick
its still about this exercise
The $f_{\epsilon_i}(t) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left( -\frac{t^2}{2 \sigma^2}\right)$ with a minus in the exponential
Rion
when it says find the confidence interval, how the shit would I approach that?
$g(r, \sigma) = - \frac{n}{2} \ln(2\pi) - n \ln \sigma - \sum_{i=1}^{n} \frac{(v_i - r x_i)^2}{2\sigma^2}$
Rion
For what?
The estimators?
$\hat{r} = \left( \sum_{i=1}^{n} x_i^2 \right) \sum_{i=1}^{n} x_i v_i$
Rion
$\hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^{n} (v_i - \hat{r} x_i)^2$
To think about confidence intervals
You should think of whether $\hat{r}$ is far from $r$
Rion
Considering that your $x_i$ and $v_i$ are random variables
Rion
then $\hat{r}$ and $\hat{\sigma}$ are also random variables
Rion
I'm not exactly sure if $\hat{r}$ is a student rv
Rion
For the $\hat{\sigma}^2$
Rion
I can see it being the case
Rion
fischer, then?
But then I don't have absolute guarantees
Let me think for a sec
Give a confidence interval for r of significance level α, with σ
2 = 1.
this is the question
thats sigma^2
For $\hat{r}$, conditionally to the $x_i$'s, maybe you can get away with some things
Rion
So if you consider the x_i constant (and not random)
mhmm
Mainly, here
You can consider the x_i constant
but the v_i are normal
N(r x_i, sigma²)
okay
so divided by the sum of squares
$x_i v_i \sim \mathcal{N} \left( r x_i^2, x_i^2 \sigma^2 \right)$
Rion
ok
And they are all independent
mhmm
so the sum is still a normal distribution
so it follows student then?
Rion
we just gotta fill the blanks
Well, only if the parameters are 0 and 1
but they're probably not
Let's check
anyhow, for a sum of independent normals, it follows a normal
which mean is the sum of means
and the variance is the sum of variances
$\sum_{i=1}^{n} x_i v_i \sim \mathcal{N}(r \sum_{i=1}^{n} x_i^2, \sigma^2 \sum_{i=1}^{n} x_i^2)$
Rion
Now, if you divide that by the sum of xi²
$\hat{r} \sim \mathcal{N}\left(r, \sigma^2 \frac{1}{\sum_{i=1}^n x_i^2} \right)$
Rion
So now you know the distribution of $\hat{r}$
Rion
you can do your appropriate tests
makes sense
thank you so much!
i already nominated you once, idk if i can twice
you're amazing
I mean, we still haven't done shit about the other one
Which is at least ten times more difficult
because I think we need a theorem or something
No, the distribution of $\hat{\sigma}^2$
Rion
This one is pretty hardcore to prove and to be completely honest I don't have the toolbox to prove it
or at least it needs quite elaborate thinking
honestly, between you and I, I don't think I have the toolbox to understand it even if you did
I am genuinely thankful for your help so far, this is way more than i knw before so
exam is tomorrow anyway, this is a joke
4 day to finish a data science course my uni went crazy
I don't have a good proof
But $\frac{n \hat{\sigma}^2}{\sigma}$ follows a $\chi^2_{n-1}$
Rion
Enroll today at Penn State World Campus to earn an accredited degree or certificate in Statistics.
@woeful dune Check the book they mention for the proof
will do, thank you so much
Rion

Anyway, a list of takeaways for you for your exam tomorrow
knowing how to compute a MLE is kind of really important
mhmm
So first takeaway
the likelihood is just a product of likelihoods when you have iid samples
which justifies taking the log likelihood, since you go from product to sum
ok
second takeaway:
the parameters that maximize the log-likelihood also maximize the likelihood
so that also justifies using the log-likelihood instead of the likelihood
third takeaway:
you need multivariate differential calculus to compute optima
just like how you take the derivative of a function to check where it's either max or min
don't
i speak from experience, that doesn't work
i can't explain my losing streak in gacha games with statistics because i'm already outside the interval of confidence

Why
I think they just ask you to sample a bunch of x_i, and use the monte carlo estimator of the mean
i dont know tbh
well you know what a monte carlo estimator is right
Yeah
this case its a discrete one, so i was thinking of just asnwering that with the theoretical thing
like just check h(x) + h(x) + h(x) all over n, until it becomes bigger than the interval, but i dont think thats correct
Well it is possible to compute E[h(X)] exactly using the transfer theorem
but they want an approximation, not an exact value
How did you define monte carlo methods in your lecture
we did answered that algorithm question in this one , massive rip
and then for the continous rv, we have explanations with the inverse and rejection methods
and thats it
can't you just use that as an example then
Ah ok
Well I mean
You can still do something
First of all, you can estimate $\mathbb{E}[Y]$ with an empirical mean of iid samples that follow the distribution of $Y$
Rion
So $\frac{1}{n} \sum_{i=1}^{n} Y_i$
Rion
So now I guess what you need is to find how to generate $Y_i$
Rion
Rion
And you just compute $Y_i = h(X_i)$
Rion
And here, to generate $X_i$, you use the method they show in the slides
Rion
so basically, just choose the other stuff, so instead of .32, maybe .5
(which should sum up to 1)
and compute different stuff?
you generate a uniform rv
and you see in which bin it falls into
and you take the corresponding value
so this exam is written
since i can't exactly generate
should i just take random values
over and over
No, here they ask you to create a scheme
so a method
you don't use the method but you explain that you can
so basically write out an explanation?
That's what a scheme is isn't it?
Just a theoretical framework
Though then I don't quite understand the error part
the precision I mean
But I assume that it has to do with the number of samples you create, which is n
something like
rand U
X ~ U(w)
Find h(x)
Keep doing this until E[Summation H(x)] > e
The standard deviation of $\bar{Y}n = \frac{1}{n}\sum{i=1}Y_i$ is $\frac{V(Y1)}{n^2}$
Rion
so basically I should say
as long as theta < Var [h(x)]/ ne^2, keep generating
got it
Rion
But the question asks for how large
yeah, thats question c
yeah question c is eaasier since he basically tells us to use a C
Just keep in mind
I think you are expected to compute Var(h(X1))
or rather Var(h(X))
but you can do that by hand
how would i even compute var(h(x)) in that case
It's just E[(h(X) - E[h(X)])²], and given that X is discrete with just 5 or 6 values
it's not a long computation
-2, -1, 0, 2, 3
that would be what, E[h(x) - E[h(x)^2]
let me ask you the dumbest question yet
how would I find the E of it for only one value

wdym
for a discrete rv X, the expectation of X is just the sum of x_i p_i
where p_i is P(X = x_i)
and using the transfer theorem
the expectation of h(X) is the sum of h(x_i) p_i
$E[h(X)] = \sum_{i=1}^{K} h(x_i) P(X = x_i)$
Rion
and here you have like 5 values
it's just a sum of 5 terms
likewise $E[h(X)^2] = \sum_{i=1}^{K} h(x_i)^2 P(X = x_i)$
Rion
then what is X - E[h(X)^2]
The variance of a rv Y is the expectation of (Y - E[Y])²
so E[(Y - E[Y])²]
keep in mind that E[Y] is a number here
There is a formula that says that V(Y) = E[Y²] - E[Y]²
so you can compute those if it's easier for you
Y = h(X)
and then the mean of that, will be found how?
so its number - number
its E[number] which means what exactly
Y is a random variable
E[Y] is a number (or, I guess, in perfectly accurate terms, a constant random variable that is always equal to the same number)
this formula seems easier
so in my case i would do var (h(x)) = E[h(x^2)) - E[h(x)]^2
No
oh
I will pass on that but I appreciate the gratitude
can I give you a nitro as thanks or is that against the rules
I don't know, I mean I don't think anyone will complain if you message me privately
I complain when people ask me for help in private, because I don't like it and the rules explicitly say not to do so
I didn't help you to receive anything in return though
meh, after days of asking for help in math discords, finally someone who could actually help me (and somehow had fucking infinite patience)
plus, i wouldn't be surprised if i asked another question in the future, so whatever
gratitude is good with words, but i like sending gifts too
shit, ny card isn't working
i will DM you it later, thank you so much again @broken plank
@woeful dune has given 1 rep to @broken plank
sure we can talk later
I'm helping someone else with another q
@broken plank shit, whenever you see this, we never actually get the estimators to the end and i am lost on how to do it
if you see this and write it out, thank you
Wdym?
We only partiall derived by sigma
But never actually got what sigma is