🟢 Hidden Beauty Behind Generative AI
Artem Kirsanov
https://www.youtube.com/watch?v=laaBLUxJUMY
To derive the expression on the right from the expression on the left in the image, we are transitioning from the true distribution over all possible states (left-hand side) to an approximation based on samples from the dataset (right-hand side). This is a common technique in machine learning when we cannot access the full distribution
- This is an expectation over all possible states
$$x$$ , where$$P(x)$$ is the true distribution of the data, and$$P_\theta(x)$$ is the model's approximation parameterized by$$\theta$$ . - We sum over all possible states
$$x$$ , but in most practical cases, this is computationally intractable since there could be an infinite or very large number of states.
- Here, instead of summing over all states
$$x$$ , we approximate the expectation using a finite set of samples$${x_k}_{k=1}^N$$ drawn from the true distribution$$P(x)$$ . This is known as Monte Carlo approximation or sampling-based approximation.
-
Expectation as a Sum:
The left-hand side expression represents the expected value of
$$\log P_\theta(x)$$ under the true data distribution$$P(x)$$ :
But computing this exact expectation requires summing over all possible states
-
Monte Carlo Approximation:
If we can't sum over all possible states, we can instead approximate this expectation by averaging over a finite set of samples
$${x_k}_{k=1}^N$$ drawn from the true distribution$$P(x)$$ . This gives the right-hand side expression:
Here,
- We can't know the full probability distribution
$$P(x)$$ because it might be too complex or high-dimensional, but we do have a dataset that consists of samples from$$P(x)$$ . - Therefore, instead of summing over all possible states, we estimate the sum using the available samples
$${x_k}_{k=1}^N$$ .
This is the essence of transitioning from the true expectation to a sample-based approximation, which is a key idea in many machine learning algorithms, particularly in maximum likelihood estimation and gradient-based learning algorithms.