Some questions about Bayesian Stats | Mathematics | Page 1

sharp inlet Jun 18, 2024, 2:08 PM

#

E_theta[p(y|x, theta)] - what is this? What is it a function of?
I understand that p(y|x) is the “full” Bayesian Inference for our problem and is the distribution of the label y given a point.

limber rootBOT Jun 18, 2024, 2:08 PM

#

sharp inlet E_theta[p(y|x, theta)] - what is this? What is it a function of? I understand th...

#

Ask your question and show the work you've done so far. If you've posted a screenshot of a question, specify which part you need help with.
Wait patiently for a helper to come along.
Once someone helps you, say thank you and close the thread with:
```
+close
```
Feel free to nominate the person for helper of the week in #helper-nominations
Do not ping the mods, unless someone is breaking the rules.
If you're happy with the help you got here, and the server overall, you can contribute financially as well:

sharp inlet Jun 19, 2024, 3:44 AM

#

bump pandalove \

open shale Jun 21, 2024, 1:48 PM

#

E_theta[p(y|x, theta)] - what is this thing? is it related to LMS estimator in any way. To calculate this expectation wrt theta, we integrate the argument wrt theta weighted by the updated distribution of theta.. makes sense. Unsure why this integral gives us a distribution instead of a number.
E_theta[p(y|x, theta)] is the expected value of p(y|x, theta) where you treat theta like a known constant instead of a variable.
The result is a distribution because E_theta[p(y|x, theta)] still depends on x

cosmic nest Jun 21, 2024, 2:46 PM

#

I feel as though this might be a bit incorrect, one should have that:
$$p(y\vert x) = E_{\Theta}[p(y \vert x, \Theta)] = \int_{\theta} p(y \vert x, \theta) p_\Theta(\theta) d\theta$$

mint pythonBOT Jun 21, 2024, 2:46 PM

#

Rion

cosmic nest Jun 21, 2024, 2:47 PM

#

Where $\Theta$ is the random variable corresponding to the a priori distribution on the parameters

mint pythonBOT Jun 21, 2024, 2:47 PM

#

Rion

sharp inlet Jun 24, 2024, 12:06 PM

#

is it also correct to view this as averaging the likelihoods over the posterior?

sharp inlet Jun 24, 2024, 12:09 PM

#

mint python **Rion**

Here I'm unsure what's a random variable and what's a constant. To my understanding x is a constant since it's the previously unseen test point that we want to make a prediction on. y is a random variable for the possible label value (also follows from linear regression being discriminative). Then \Theta is a random variable - the posterior distribution. is this correct

sharp inlet Jun 24, 2024, 1:45 PM

#

And what would each of these objects look like in ridge regression

cosmic nest Jun 24, 2024, 4:23 PM

#

sharp inlet Here I'm unsure what's a random variable and what's a constant. To my understand...

Well see

#

You have some parametric distributions such as, say, a binomial one

#

or a Bernoulli one

#

or even a normal distribution, that's also parametric

#

So for instance, N(m, s²) has parameters theta = (m, s²)

#

but the prior distribution on the parameters means that theta is random too

cosmic nest Jun 24, 2024, 4:25 PM

#

sharp inlet And what would each of these objects look like in ridge regression

In a Ridge regression, I don't think you have this kind of approach at all, but I might be wrong

#

So essentially, see that a data point $x$ follows a distribution that is parametrized by $\theta$, say $p_{X \vert \theta}$, which is called the prior distribution. However, $\theta$ is RANDOM, and it follows a distribution $p_{\Theta}$. So you have a reasoning in two steps here:

Firstly, guess $\theta$ based on supporting evidence (your dataset $D$)
Secondly, use that to estimate the distribution of $x$, and obtain a predictive distribution

mint pythonBOT Jun 24, 2024, 4:33 PM

#

Rion

cosmic nest Jun 24, 2024, 4:37 PM

#

Example: linear regression. The data points $(x, y)$ follow the a certain distribution, given by:
$$y = \langle w, x \rangle + \epsilon$$

Where $\epsilon \sim \mathcal{N}(0, \sigma^2)$. So here, you have two parameters to guess: $\theta = (w, \sigma^2)$.

mint pythonBOT Jun 24, 2024, 4:37 PM

#

Rion

sharp inlet Jun 24, 2024, 4:41 PM

#

👀

cosmic nest Jun 24, 2024, 4:41 PM

#

In the typical linear regression problem without regularization, what you effectively do is to maximize the likelihood, as in:
$$\hat{\theta}{MLE} = \arg\max{\theta} p_{X \vert \theta}(D \vert \theta)$$
However, what the MAP does is different, i.e.
$$\hat{\theta}{MAP} = \arg\max{\theta} p_{X \vert \theta}(D \vert \theta)p_\Theta(\theta)$$

mint pythonBOT Jun 24, 2024, 4:42 PM

#

Rion

sharp inlet Jun 24, 2024, 4:44 PM

#

cosmic nest In a Ridge regression, I don't think you have this kind of approach at all, but ...

bayesian ridge regression has the prior weight distribution as a Gaussian, and then we can do MAP to derive the same problem

cosmic nest Jun 24, 2024, 4:44 PM

#

Now, being also super lazy, we actually suppose we also know the distribution $p_{\Theta}(\theta) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{1}{2} \theta^2 \right)$

mint pythonBOT Jun 24, 2024, 4:44 PM

#

Rion

cosmic nest Jun 24, 2024, 4:44 PM

#

sharp inlet bayesian ridge regression has the prior weight distribution as a Gaussian, and t...

That's correct

#

So now, with that being give, you maximize just as you did before

sharp inlet Jun 24, 2024, 4:48 PM

#

cosmic nest So essentially, see that a data point $x$ follows a distribution that is paramet...

here, guess theta based on D means finding the posterior distribution p(theta | D) using bayes rule?

cosmic nest Jun 24, 2024, 4:48 PM

#

sharp inlet here, guess theta based on D means finding the posterior distribution p(theta | ...

It means providing an estimation of theta

sharp inlet Jun 24, 2024, 4:48 PM

#

mint python **Rion**

and the point with secondly means to find p(y | x)?

cosmic nest Jun 24, 2024, 4:49 PM

#

like, for instance, in a regression problem, you find the slope, then you use the slope to make a predictor

#

it's not that much different

sharp inlet Jun 24, 2024, 4:49 PM

#

yes. we find this slope by going over all the training data in D

cosmic nest Jun 24, 2024, 4:50 PM

#

Yes, that is correct

#

That is what p(theta | D) is for

#

guessing which theta is more likely to control the distribution, assuming that we have a bunch of data points of that distribution already

sharp inlet Jun 24, 2024, 4:51 PM

#

Ok, I see. That answers my question on what is and isn’t a random variable

#

ohh i understand now

#

tysm!

#

+close

#Some questions about Bayesian Stats