We assume, without too much loss of generality, that our priors are discrete. When dealing with Hellinger separable density spaces, it is possible to discretize posterior distributions to study consistency (see this post about it).

Let be a prior on a countable space of probability density functions, with for all . Data follows (independently) some unknown distribution with density .

We denote by the Kullback-Leibler divergence and we let be half of the squared Hellinger distance.

The following theorem states that the posterior distribution of accumulates in Hellinger neighborhoods of , assuming the prior is root-summable (i.e. for some ) . In the well-specified case (i.e. ), the posterior accumulates in any neighborhood of . In the misspecified case, small neighborhoods of could be empty, but the posterior distribution still accumulates in sufficiently large neighborhoods (how large exactly is a function of and ).

The result was essentially stated by Barron (Discussion: On the Consistency of Bayes Estimates, 1986). In the case where is not necessarily discrete, a similar result was obtained, through a discretization argument, by Walker (Bayesian asymptotics with misspecified models, 2013). See also Xing (Sufficient conditions for Bayesian consistency, 2009) for a thorough treatment of Bayesian consistency using the same method of proof.

**Theorem** (Barron).

*Suppose and that
*

*If is such that *

*and if , then*

*almost surely as .*

**Remarks.**

1 – If , then any can be used.

2 – is related to the rate of convergence of towards . The quantity can be thought as measure of entropy.

3 – Walker (2013) considered the case for some . This implies that and the above theorem can also be applied in this case.

## Demonstration

**Background.**

First let me recall some concepts. The -affinity between two densities and is defined as

Note that and that if and only if . Furthermore, when , Holder’s inequality and Jensen’s inequality yield

**Proof.**

We can now begin the proof. Let be such that . Then, we have

If and is such that , then

almost surely. Furthermore, using (2), we find

Here is a constant. Since is arbitrary and since can be taken so that , we obtain that converges exponentially fast towards . Hence, by the Borel-Cantelli lemma, we have

almost surely. This, together with , implies that almost surely.

[…] explain the discretization trick that I mentioned in my previous post (Posterior consistency under possible misspecification). I think it was introduced by Walker (New approaches to Bayesian consistency […]

[…] consistency of Bayesian nonparametric procedures. See this post series for an introduction, and this post for a proof of the main consistency […]

[…] Posterior consistency under misspecification […]