Bayesian Inference and Intractable Integrals

Aslesha
4 min readApr 12, 2021

Bayesian Inference in Machine Learning

Bayesian Inference is a statistical method often used in machine learning to estimate parameters of a probabilistic model using Bayes’ theorem. As Bayes’ theorem is a method used for updating prior beliefs based on the observed data, the first step of performing Bayesian Inference is to choose a probability density p(𝜃), also called prior distribution, to express initial beliefs about the parameter 𝜃 ∊ Θ of the model. There are many different ways to conjecture such priors [1] depending on prior knowledge about the data. The most commonly chosen priors are Gaussian distribution, uniform distribution, etc, due to their simplicity and wide applicability. Similarly, a statistical model p(x|𝜃), also called the likelihood function, is chosen to reflect beliefs about x given 𝜃. Finally, after observing the data D=X_1, …, X_n, the posterior distribution is calculated using the Bayes’ theorem as,

If the likelihood and prior functions are chosen carefully, the posterior distribution can be computed efficiently as there might be an analytic solution to it. However, as it often happens in real life, choosing the prior and likelihood function to solely make the above equation tractable (efficient to compute) won’t provide a good model for the parameters of the data. So, even with simple priors and likelihood functions, calculating the exact posterior distribution becomes intractable in high dimensions. To overcome this issue, instead of calculating the exact posterior several approximation methods are used, some of which are discussed in the following section.

Ways to get around Intractable Inference

The exact inference methods such as variable elimination can often get infeasible and intractable in high dimensions so various approximation methods are used instead. These are divided into sampling based and variational methods [2].

Sampling based methods

In the sampling based algorithm, the probability is calculated by sampling various data points. Since the sample space can vary widely depending on the probability distribution, there are different algorithms to effectively sample and calculate the desired probability. Two major categories of sampling based algorithms are Monte Carlo methods and Markov Chain Monte Carlo methods. In the Monte Carlo methods such as simple Monte Carlo sampling, importance sampling, rejection sampling etc, random independent samples are generated to estimate the probability. Similarly, in the Markov Chain Monte Carlo methods, sample points are generated from a Markov chain which are random but dependent on the last sample. Some of the regularly used Markov Chain Monte Carlo methods are Gibbs sampling, Metropolis-Hastings algorithm, etc. The sampling based methods tend to get closer to the true value as the number of samples increases but it also increases the computational time so deciding the appropriate number of samples involves a time-accuracy trade off. The detailed explanation of these methods can be found at [4].

Variational methods

The main idea behind variational methods is to approximate the intractable distribution p(x|D) with a simpler, tractable distribution q(𝜃). While we might not be able to find the true distribution p, we minimize Kullback -Leibler divergence KL((𝜃)||p(𝜃|D)) to find an approximate q closer to p [3]. The Kullback -Leibler divergence KL(q||p) provides a non-symmetric measure of the difference between two distributions p and q where,

This way, the approximation problem is turned into an optimization problem which can be solved using the iterative methods such as gradient descent.

In this blog I briefly introduced the concept of Bayesian inference and summarized the methods used to work around the intractable posterior distribution. Each of the methods mentioned above deserves a blog of its own and can be studied further at [1, 4].

References

[1] Wasserman, L. Bayesian Inference. Retrieved from http://www.stat.cmu.edu/~larry/=sml/Bayes.pdf

[2] Goyal, P. (2017, November 23). Probabilistic Graphical Models Tutorial — Part 2. Retrieved April 12, 2021, from https://blog.statsbot.co/probabilistic-graphical-models-tutorial-d855ba0107d1

[3] Salakhutdinov, R. Approximate Inference. Retrieved April 11, 2021, from https://www.cs.toronto.edu/~rsalakhu/talks/talk_inference.pdf

[4] Lumann, F. (2018, June 11). Inference in Probabilistic Models: Monte Carlo and variational methods. Retrieved April 12, 2021, from
https://medium.com/neuralspace/inference-in-probabilistic-models-monte-carlo-and-deterministic-methods-eae8800ee095

--

--