#Some questions about Bayesian Stats
44 messages · Page 1 of 1 (latest)
- Ask your question and show the work you've done so far. If you've posted a screenshot of a question, specify which part you need help with.
- Wait patiently for a helper to come along.
- Once someone helps you, say thank you and close the thread with:
+close - Feel free to nominate the person for helper of the week in #helper-nominations
- Do not ping the mods, unless someone is breaking the rules.
- If you're happy with the help you got here, and the server overall, you can contribute financially as well:
bump
\
E_theta[p(y|x, theta)] - what is this thing? is it related to LMS estimator in any way. To calculate this expectation wrt theta, we integrate the argument wrt theta weighted by the updated distribution of theta.. makes sense. Unsure why this integral gives us a distribution instead of a number.
E_theta[p(y|x, theta)] is the expected value of p(y|x, theta) where you treat theta like a known constant instead of a variable.
The result is a distribution because E_theta[p(y|x, theta)] still depends on x
I feel as though this might be a bit incorrect, one should have that:
$$p(y\vert x) = E_{\Theta}[p(y \vert x, \Theta)] = \int_{\theta} p(y \vert x, \theta) p_\Theta(\theta) d\theta$$
Rion
Where $\Theta$ is the random variable corresponding to the a priori distribution on the parameters
Rion
is it also correct to view this as averaging the likelihoods over the posterior?
Here I'm unsure what's a random variable and what's a constant. To my understanding x is a constant since it's the previously unseen test point that we want to make a prediction on. y is a random variable for the possible label value (also follows from linear regression being discriminative). Then \Theta is a random variable - the posterior distribution. is this correct
And what would each of these objects look like in ridge regression
Well see
You have some parametric distributions such as, say, a binomial one
or a Bernoulli one
or even a normal distribution, that's also parametric
So for instance, N(m, s²) has parameters theta = (m, s²)
but the prior distribution on the parameters means that theta is random too
In a Ridge regression, I don't think you have this kind of approach at all, but I might be wrong
So essentially, see that a data point $x$ follows a distribution that is parametrized by $\theta$, say $p_{X \vert \theta}$, which is called the prior distribution. However, $\theta$ is RANDOM, and it follows a distribution $p_{\Theta}$. So you have a reasoning in two steps here:
-
Firstly, guess $\theta$ based on supporting evidence (your dataset $D$)
-
Secondly, use that to estimate the distribution of $x$, and obtain a predictive distribution
Rion
Example: linear regression. The data points $(x, y)$ follow the a certain distribution, given by:
$$y = \langle w, x \rangle + \epsilon$$
Where $\epsilon \sim \mathcal{N}(0, \sigma^2)$. So here, you have two parameters to guess: $\theta = (w, \sigma^2)$.
Rion
👀
In the typical linear regression problem without regularization, what you effectively do is to maximize the likelihood, as in:
$$\hat{\theta}{MLE} = \arg\max{\theta} p_{X \vert \theta}(D \vert \theta)$$
However, what the MAP does is different, i.e.
$$\hat{\theta}{MAP} = \arg\max{\theta} p_{X \vert \theta}(D \vert \theta)p_\Theta(\theta)$$
Rion
bayesian ridge regression has the prior weight distribution as a Gaussian, and then we can do MAP to derive the same problem
Now, being also super lazy, we actually suppose we also know the distribution $p_{\Theta}(\theta) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{1}{2} \theta^2 \right)$
Rion
That's correct
So now, with that being give, you maximize just as you did before
here, guess theta based on D means finding the posterior distribution p(theta | D) using bayes rule?
It means providing an estimation of theta
and the point with secondly means to find p(y | x)?
like, for instance, in a regression problem, you find the slope, then you use the slope to make a predictor
it's not that much different
yes. we find this slope by going over all the training data in D