Bayesian Evidence

Bayesian Evidence#

Until now, we have broadly ignored the denominator on Bayes’ theorem, known as the Bayesian evidence. Now, we will look at the meaning of this part of the equation and discuss the utility in the computation of it.

The Bayesian evidence for some data, \(D\), given some model, \(M\), can be expressed as the marginal likelihood,

(40)#\[ p(D | M) = \int p[D | M(\Theta)]p[M(\Theta)]\;\text{d}\Theta, \]

where the integral may be multi-dimensional over all \(\Theta\) space. So, the evidence is found as the integral over all possible parameter space.

A Discrete Example#

It is most straightforward to discuss the computation of the Bayesian evidence in the context of a discrete example. Therefore, consider the example of conducting a particular type of COVID-19 test. From this test, we want to determine the probability that you have or don’t have COVID-19, \(p(x | +)\), such that \(x=1\) indicates COVID-19. We can write Bayes’ theorem for this as follows,

\[ p(x=1 | +) = \frac{p(+ | x=1) p(x=1)}{p(+)} \]

where the likelihood, \(p(+ | x=1)\) is the probability of getting a positive test if you have COVID-19; for this example, we can say that the test is 99.5 % effective at identifying COVID-19. The prior should based on our inutition, for example, if you have a bad cough, you might want to pick a higher probability whereas if you feel fine, you may want to use the percentage frequency in the population.
Here, we will use a prior of 25 %, reflecting the cough we have. From Eqn. (40) we can write,

\[ p(+) = \int p(+ | x) p(x)\;\textrm{d}x. \]

However, there are only two outcomes: either you have COVID-19, or you don’t, i.e., \(x\) can be 1 if you have COVID-19 or 0 if you don’t. So, we should write this as a sum of these two situations,

(41)#\[ p(+) = p(+ | x=1) p(x=1) + p(+ | x=0) p(x=0). \]

Given that the test can also have only two outcomes,

\[ p(+ | x=0) + p(- | x=0) = 1, \]

so we can rewrite Eqn. (41) as,

(42)#\[ p(+) = p(+ | x=1) p(x=1) + [1 - p(- | x=0)][1 - p(x=1)], \]

where \(p(- | x=0)\) is the probability of a negative test result when you don’t have COVID-19, which the manufacturer says is 98.8%. Putting these numbers into Eqn (42), we get,

likelihood_positive_covid = 0.995
prior_covid = 0.25
likelihood_negative_nocovid = 0.989

likelihood_positive_covid * prior_covid + (
    1 - likelihood_negative_nocovid) * (1 - prior_covid)
0.257

This is to say that the Bayesian evidence for this type of test is 0.257. This can be used to compute the correctly normalised posterior or, as we will see next, to enable model comparison.