# Posterior consistency under (possible) misspecification

We assume, without too much loss of generality, that our priors are discrete. When dealing with Hellinger separable density spaces, it is possible to discretize posterior distributions to study consistency (see this post about it).

Let $\Pi$ be a prior on a countable space $\mathcal{N} = \{f_1, f_2, f_3, \dots\}$ of probability density functions, with $\Pi(f) > 0$ for all $f \in \mathcal{N}$. Data $X_1, X_2, X_3, \dots$ follows (independently) some unknown distribution $P_0$ with density $f_0$.

We denote by $D_{KL}(f_0, f) = \int f_0 \log\frac{f_0}{f}$ the Kullback-Leibler divergence and we let $D_{\frac{1}{2}}(f_0, f) = 1 - \int \sqrt{f_0 f}$ be half of the squared Hellinger distance.

The following theorem states that the posterior distribution of $\Pi$ accumulates in Hellinger neighborhoods of $f_0$, assuming the prior is root-summable (i.e. $\sum_{f \in \mathcal{N}} \Pi(f)^\alpha < \infty$ for some $\alpha > 0$) . In the well-specified case (i.e. $\inf_{f \in \mathcal{N}} D_{KL}(f_0, f) = 0$), the posterior accumulates in any neighborhood of $f_0$. In the misspecified case, small neighborhoods of $f_0$ could be empty, but the posterior distribution still accumulates in sufficiently large neighborhoods (how large exactly is a function of $\alpha$ and $\inf_{f \in \mathcal{N}} D_{KL}(f_0, f)$).

The result was essentially stated by Barron (Discussion: On the Consistency of Bayes Estimates, 1986). In the case where $\Pi$ is not necessarily discrete, a similar result was obtained, through a discretization argument, by Walker (Bayesian asymptotics with misspecified models, 2013). See also Xing (Sufficient conditions for Bayesian consistency, 2009) for a thorough treatment of Bayesian consistency using the same method of proof.

Theorem (Barron).
Suppose $\beta_0 :=\inf_{f \in \mathcal{N}} D_{KL}(f_0, f) < \infty$ and that $\alpha := \inf \left\{ p \in [\tfrac{1}{2},1] \,|\, \sum_{f \in \mathcal{N}} \Pi(f)^p < \infty \right\} < 1.$

If $\varepsilon > 0$ is such that $\varepsilon > 1- \exp\left( \frac{-\beta_0 \alpha}{2(1-\alpha)} \right)$

and if $A_\varepsilon := \{f \in \mathcal{N} \,|\, D_{\frac{1}{2}} (f_0, f) < \varepsilon\} \not = \emptyset$, then $\Pi\left(A_\varepsilon \,|\, \{x_i\}_{i=1}^n\right) \rightarrow 1$

almost surely as $n \rightarrow \infty$.

Remarks.

1 – If $\inf_{f \in \mathcal{N}} D_{KL}(f_0, f) = 0$, then any $\varepsilon > 0$ can be used.

2 – $\alpha$ is related to the rate of convergence of $\rho(n) := \Pi(f_n)$ towards $0$. The quantity $H_\alpha (\Pi) = \log \sum_{f \in \mathcal{N}} \Pi(f)^\alpha$ can be thought as measure of entropy.

3 – Walker (2013) considered the case $\sum_{f \in \mathcal{N}} \Pi(f)^\alpha < \infty$ for some $\alpha < \frac{1}{2}$. This implies that $\sum_{f \in \mathcal{N}} \sqrt{\Pi(f)} < \infty$ and the above theorem can also be applied in this case.

## Demonstration

Background.
First let me recall some concepts. The $\alpha$-affinity between two densities $f$ and $g$ is defined as $A_\alpha(f, g) = \int g^\alpha f^{1-\alpha}. \qquad (1)$

Note that $0 \le A_{\alpha}(f, g) \le 1$ and that $A_\alpha(f, g) = 1$ if and only if $f = g$. Furthermore, when $\alpha \geq \frac{1}{2}$, Holder’s inequality and Jensen’s inequality yield $A_{\frac{1}{2}}(f, g) \le A_{\alpha}(f, g) \le \left(A_{\frac{1}{2}}(f,g)\right)^{2(1-\alpha)}. \qquad (2)$

Proof.
We can now begin the proof. Let $\tilde{\alpha}$ be such that $1 > \tilde{\alpha} > \alpha$. Then, we have If $\beta > 0$ and $g \in \mathcal{N}$ is such that $D_{KL}(f_0, g) < \beta_0 + \beta$, then almost surely. Furthermore, using (2), we find Here $\text{cst.}$ is a constant. Since $\beta > 0$ is arbitrary and since $\tilde \alpha$ can be taken so that $2(1-\tilde \alpha) \log (1-\varepsilon) + \tilde \alpha \beta_0 < 0$, we obtain that $(**)$ converges exponentially fast towards $0$. Hence, by the Borel-Cantelli lemma, we have almost surely. This, together with $(3)$, implies that $(*) \rightarrow 0$ almost surely. $\Box$

## 3 thoughts on “Posterior consistency under (possible) misspecification”

1. […] explain the discretization trick that I mentioned in my previous post (Posterior consistency under possible misspecification). I think it was introduced by Walker (New approaches to Bayesian consistency […]

2. […] consistency of Bayesian nonparametric procedures. See this post series for an introduction, and this post for a proof of the main consistency […]

3. […] Posterior consistency under misspecification […]