#help understanding terms in variational autoencoders elbo and relation to loss function

2 messages · Page 1 of 1 (latest)

quick sail
#

so i'll start off by summarising how i understood it:

in the elbo, you have the distributions p,q, as well as conditional and joint evaluations of these distributions. x is used to describe the observed input, and z is then sampled from the distribution spit out by the encoder, q.

so instead of having one observation to one latent vector, vae maps it to one distribution q
and q(z|x) = probability that z is latent vector for observation x according to the encoder
p(z|x) = probability that z is latent vector for observation x, actually
and then p(x) = probability that x is observed among the set of all possible observations, p(x,z) = probability that among all possible observations and underlying latent vectors, one observation is x with underlying z
q(x), q(x,z) i suppose would technically exist? but i wouldn't be able to come up with how to sample these distributions

and these are all the terms that appear in the elbo i believe, which when maximized, minimizes KL divergence
so i would have thought that elbo appears in the loss function (which would then be maximized using gradient ascent) then, and i couldn't come up with an idea how you would evaluate many of the terms appearing in the elbo, except for maybe q(z|x), since your encoder generates that and you literally know the distribution, as for all p terms, you don't know what p actually is

but when i look at https://github.com/rasbt/stat453-deep-learning-ss21/blob/2202699c5fd38af398e2682f289a0868b1b91f0e/L17/helper_train.py#L159 or https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73#780e the loss function is just reconstruction error + kl divergence of the predicted distribution from standard normal distribution?!
so basically, we are declaring our latent space to be the result from sampling from the standard normal, and thus force our model to learn a latent space that is distributed like the standard normal with the kl term?

#

you can probably tldr to the last section tbh