We assume, without too much loss of generality, that our priors are discrete. When dealing with Hellinger separable density spaces, it is possible to discretize posterior distributions to study consistency (see this post about it).
Let be a prior on a countable space of probability density functions, with for all . Data follows (independently) some unknown distribution with density .
We denote by the Kullback-Leibler divergence and we let be half of the squared Hellinger distance.
The following theorem states that the posterior distribution of accumulates in Hellinger neighborhoods of , assuming the prior is root-summable (i.e. for some ) . In the well-specified case (i.e. ), the posterior accumulates in any neighborhood of . In the misspecified case, small neighborhoods of could be empty, but the posterior distribution still accumulates in sufficiently large neighborhoods (how large exactly is a function of and ).
The result was essentially stated by Barron (Discussion: On the Consistency of Bayes Estimates, 1986). In the case where is not necessarily discrete, a similar result was obtained, through a discretization argument, by Walker (Bayesian asymptotics with misspecified models, 2013). See also Xing (Sufficient conditions for Bayesian consistency, 2009) for a thorough treatment of Bayesian consistency using the same method of proof.
Suppose and that
If is such that
and if , then
almost surely as .
1 – If , then any can be used.
2 – is related to the rate of convergence of towards . The quantity can be thought as measure of entropy.
3 – Walker (2013) considered the case for some . This implies that and the above theorem can also be applied in this case.
The proof is brief. I do not dwell on explanations.
First let me recall some concepts. The -affinity between two densities and is defined as
Note that and that if and only if . Furthermore, when , Holder’s inequality and Jensen’s inequality yield
We can now begin the proof. Let be such that . Then, we have
If and is such that , then
almost surely. Furthermore, using (2), we find
Here is a constant. Since is arbitrary and since can be taken so that , we obtain that converges exponentially fast towards . Hence, by the Borel-Cantelli lemma, we have
almost surely. This, together with , implies that almost surely.