# Fractional Posteriors and Hausdorff alpha-entropy

Bhattacharya, Pati & Yan (2016) wrote an interesting paper on Bayesian fractional posteriors. These are based on fractional likelihoods – likelihoods raised to a fractional power – and provide robustness to misspecification. One of their results shows that fractional posterior contraction can be obtained as only a function of prior mass attributed to neighborhoods, in a sort of Kullback-Leibler sense, of the parameter corresponding to the true data generating distribution (or the one closest to it in the Kullback-Leibler sense). With regular posteriors, on the other hand, a complexity constraint on the prior distribution is usually also required in order to show posterior contraction.

Their result made me think of the approach of Xing & Ranneby (2008) to posterior consistency. Therein, a prior complexity constraint specified through the so-called Hausdorff $\alpha$-entropy is used to allow bounding the regular posterior distribution by something that is similar to a fractional posterior distribution. As it turns out, the proof of Theorem 3.2 of of Battacharya & al. (2016) can almost directly be adapted to regular posteriors in certain cases, using the Hausdorff $\alpha$-entropy to bridge the gap. Let me explain this in some more detail.

Le me consider well-specified discrete priors for simplicity. More generally, the discretization trick could possibly yield similar results for non-discrete priors.

I will follow as closely as possible the notations of Battacharya & al. (2016). Let $\{p_{\theta}^{(n)} \mid \theta \in \Theta\}$ be a dominated statistical model, where $\Theta = \{\theta_1, \theta_2, \theta_3, \dots\}$ is discrete. Assume $X^{(n)} \sim p_{\theta_0}^{(n)}$ for some $\theta_0 \in \Theta$, let

$B_n(\varepsilon, \theta_0) = \left\{ \int p_{\theta_0}^{(n)}\log\frac{p_{\theta_0}^{(n)}}{p_{\theta}^{(n)}} < n\varepsilon^2,\, \int p_{\theta_0}^{(n)}\log^2\frac{p_{\theta_0}^{(n)}}{p_{\theta}^{(n)}} < n\varepsilon^2 \right\}$

and define the Renyi divergence of order $\alpha$ as

$D^{(n)}_{\alpha}(\theta, \theta_0) = \frac{1}{\alpha-1}\log\int\{p_{\theta}^{(n)}\}^\alpha \{p_{\theta_0}^{(n)}\}^{1-\alpha}.$

We let $\Pi_n$ be a prior on $\Theta$ and its fractional posterior distribution of order $\alpha$ is defined as

$\Pi_{n, \alpha}(A \mid X^{(n)}) \propto \int_{A}p_{\theta}^{(n)}\left(X^{(n)}\right)^\alpha\Pi_n(d\theta)$

In this well-specified case, one of their result is the following:

Theorem 3.2 of Bhattacharya & al. (particular case)
Fix $\alpha \in (0,1)$ and assume that $\varepsilon_n$ satisfies $n\varepsilon_n^2 \geq 2$ and

$\Pi_n(B_n(\varepsilon_n, \theta_0)) \geq e^{-n\varepsilon_n^2}.$

Then, for any $D \geq 2$ and $t > 0$,

$\Pi_{n,\alpha}\left( \frac{1}{n}D_\alpha^{(n)}(\theta, \theta_0) \geq \frac{D+3t}{1-\alpha} \varepsilon_n^2 \mid X^{(n)} \right) \leq e^{-tn\varepsilon_n^2}.$

holds with probability at least $1-2/\{(D-1+t)^2n\varepsilon_n^2\}$.

Let us define the $\alpha$-entropy of the prior $\Pi_n$ as

$H_\alpha(\Pi_n) = \sum_{\theta \in \Theta} \Pi_n(\theta)^\alpha.$

An adaptation of the proof of the previous Theorem, in our case where $\Pi_n$ is discrete, yields the following.

Proposition (Regular posteriors)
Fix $\alpha \in (0,1)$ and assume that $\varepsilon_n$ satisfies $n\varepsilon_n^2 \geq 2$ and

$\Pi_n(B_n(\varepsilon_n, \theta_0)) \geq e^{-n\varepsilon_n^2}.$

Then, for any $D \geq 2$ and $t > 0$,

$\Pi_{n}\left( \frac{1}{n}D_\alpha^{(n)}(\theta, \theta_0) \geq \frac{D+3t}{1-\alpha} \varepsilon_n^2 \mid X^{(n)} \right)^\alpha \leq H_\alpha(\Pi_n) e^{-tn\varepsilon_n^2}.$

holds with probability at least $1-2/\{(D-1+t)^2n\varepsilon_n^2\}$.

Note that $H_\alpha(\Pi_n)$ may be infinite, in which case the upper bound on the tails of $\frac{1}{n}D_\alpha^{(n)}$ is trivial. When the prior is not discrete, my guess is that the complexity term $H_\alpha(\Pi_n)$ should be replaced by a discretization entropy ${}H_\alpha(\Pi_n; \varepsilon_n)$ which is the $\alpha$-entropy of a discretized version of $\Pi_n$ whose resolution (in the Hellinger sense) is some function of $\varepsilon_n$.

Proof of the proposition.
Assume $H_\alpha(\Pi_n) < \infty$, as otherwise the result is trivial. I will follow as closely as possible the proof of Bhattacharya et al, while skipping some details. Let $r_n(\theta, \theta_0) = \log\{p_{\theta_0}^{(n)}(X^{(n)}) / p_{\theta}^{(n)}(X^{(n)}) \}$ and

$U_n = \left\{ \theta\in \Theta \mid D_{\alpha}^{(n)}(\theta, \theta_0) \geq \frac{D+3t}{1-\alpha}n\varepsilon_n^2 \right\}.$

Then using the subadditivity of $x \mapsto x^\alpha$ and the definition of the posterior,

$\Pi_n(U_n \mid X^{(n)})^\alpha \leq \sum_{\theta \in U_n}\Pi_n(\{\theta\}\mid X^{(n)})^\alpha = \frac{\sum_{\theta \in U_n} e^{-\alpha r_n(\theta, \theta_0)}\Pi_n(\{\theta\})^\alpha}{\left(\sum_{\theta \in \Theta}e^{- r_n(\theta, \theta_0)}\right)^\alpha}.$

Now to bound the numerator, we apply Markov’s inequality and get

$\mathbb{P}\left( \sum_{\theta \in U_n} e^{-\alpha r_n(\theta, \theta_0)}\Pi_n(\{\theta\})^\alpha \geq \varepsilon \right) \leq \varepsilon^{-1}\sum_{\theta \in U_n} \mathbb{E}[e^{-\alpha r_n(\theta, \theta_0)}]\Pi_n(\{\theta\})^\alpha.$

Using the fact that $\mathbb{E}[e^{-\alpha r_n(\theta, \theta_0)}] = e^{-(1-\alpha)D^{(n)}_\alpha(\theta, \theta_0)}$ is bounded by $e^{-(D+3t)n\varepsilon_n^2}$ over $\theta \in U_n$ and letting $\varepsilon = H_\alpha(\Pi_n) e^{-(D+2t)n\varepsilon_n^2}$, this shows that the above is also bounded by

$\varepsilon^{-1}H_\alpha(\Pi_n)e^{-(D+3t) n\varepsilon_n^2} = e^{-tn\varepsilon_n^2} \leq \frac{1}{(D-1+t)^2n\varepsilon_n^2}.$

For the denominator, the Lemma 8.1 of Ghosal & al. (2000) yields

$\mathbb{P}\left( \left\{\sum_{\theta \in \Theta}e^{- r_n(\theta, \theta_0)}\right\}^\alpha \leq e^{-\alpha(D+t)n\varepsilon_n^2} \right)\leq \frac{1}{(D-1+t)^2n\varepsilon_n^2}.$

Putting the bounds on the numerator and denominator together yields the result. $\Box$