Fractional Posteriors and Hausdorff alpha-entropy

Bhattacharya, Pati & Yan (2016) wrote an interesting paper on Bayesian fractional posteriors. These are based on fractional likelihoods – likelihoods raised to a fractional power – and provide robustness to misspecification. One of their results shows that fractional posterior contraction can be obtained as only a function of prior mass attributed to neighborhoods, in a sort of Kullback-Leibler sense, of the parameter corresponding to the true data generating distribution (or the one closest to it in the Kullback-Leibler sense). With regular posteriors, on the other hand, a complexity constraint on the prior distribution is usually also required in order to show posterior contraction.

Their result made me think of the approach of Xing & Ranneby (2008) to posterior consistency. Therein, a prior complexity constraint specified through the so-called Hausdorff \alpha-entropy is used to allow bounding the regular posterior distribution by something that is similar to a fractional posterior distribution. As it turns out, the proof of Theorem 3.2 of of Battacharya & al. (2016) can almost directly be adapted to regular posteriors in certain cases, using the Hausdorff \alpha-entropy to bridge the gap. Let me explain this in some more detail.

Le me consider well-specified discrete priors for simplicity. More generally, the discretization trick could possibly yield similar results for non-discrete priors.

I will follow as closely as possible the notations of Battacharya & al. (2016). Let \{p_{\theta}^{(n)} \mid \theta \in \Theta\} be a dominated statistical model, where \Theta = \{\theta_1, \theta_2, \theta_3, \dots\} is discrete. Assume X^{(n)} \sim p_{\theta_0}^{(n)} for some \theta_0 \in \Theta, let

B_n(\varepsilon, \theta_0) = \left\{ \int p_{\theta_0}^{(n)}\log\frac{p_{\theta_0}^{(n)}}{p_{\theta}^{(n)}} < n\varepsilon^2,\, \int p_{\theta_0}^{(n)}\log^2\frac{p_{\theta_0}^{(n)}}{p_{\theta}^{(n)}} < n\varepsilon^2 \right\}

and define the Renyi divergence of order \alpha as

D^{(n)}_{\alpha}(\theta, \theta_0) = \frac{1}{\alpha-1}\log\int\{p_{\theta}^{(n)}\}^\alpha \{p_{\theta_0}^{(n)}\}^{1-\alpha}.

We let \Pi_n be a prior on \Theta and its fractional posterior distribution of order \alpha is defined as

\Pi_{n, \alpha}(A \mid X^{(n)}) \propto \int_{A}p_{\theta}^{(n)}\left(X^{(n)}\right)^\alpha\Pi_n(d\theta)

In this well-specified case, one of their result is the following:

Theorem 3.2 of Bhattacharya & al. (particular case)
Fix \alpha \in (0,1) and assume that \varepsilon_n satisfies n\varepsilon_n^2 \geq 2 and

\Pi_n(B_n(\varepsilon_n, \theta_0)) \geq e^{-n\varepsilon_n^2}.

Then, for any D \geq 2 and t > 0,

\Pi_{n,\alpha}\left( \frac{1}{n}D_\alpha^{(n)}(\theta, \theta_0) \geq \frac{D+3t}{1-\alpha} \varepsilon_n^2 \mid X^{(n)} \right) \leq e^{-tn\varepsilon_n^2}.

holds with probability at least 1-2/\{(D-1+t)^2n\varepsilon_n^2\}.

What about regular posteriors?

Let us define the \alpha-entropy of the prior \Pi_n as

H_\alpha(\Pi_n) = \sum_{\theta \in \Theta} \Pi_n(\theta)^\alpha.

An adaptation of the proof of the previous Theorem, in our case where \Pi_n is discrete, yields the following.

Proposition (Regular posteriors)
Fix \alpha \in (0,1) and assume that \varepsilon_n satisfies n\varepsilon_n^2 \geq 2 and

\Pi_n(B_n(\varepsilon_n, \theta_0)) \geq e^{-n\varepsilon_n^2}.

Then, for any D \geq 2 and t > 0,

\Pi_{n}\left( \frac{1}{n}D_\alpha^{(n)}(\theta, \theta_0) \geq \frac{D+3t}{1-\alpha} \varepsilon_n^2 \mid X^{(n)} \right)^\alpha \leq H_\alpha(\Pi_n) e^{-tn\varepsilon_n^2}.

holds with probability at least 1-2/\{(D-1+t)^2n\varepsilon_n^2\}.

Note that H_\alpha(\Pi_n) may be infinite, in which case the upper bound on the tails of \frac{1}{n}D_\alpha^{(n)} is trivial. When the prior is not discrete, my guess is that the complexity term H_\alpha(\Pi_n) should be replaced by a discretization entropy {}H_\alpha(\Pi_n; \varepsilon_n) which is the \alpha-entropy of a discretized version of \Pi_n whose resolution (in the Hellinger sense) is some function of \varepsilon_n.

Proof of the proposition.
Assume H_\alpha(\Pi_n) < \infty, as otherwise the result is trivial. I will follow as closely as possible the proof of Bhattacharya et al, while skipping some details. Let r_n(\theta, \theta_0) = \log\{p_{\theta_0}^{(n)}(X^{(n)}) / p_{\theta}^{(n)}(X^{(n)}) \} and

U_n = \left\{ \theta\in \Theta \mid D_{\alpha}^{(n)}(\theta, \theta_0) \geq \frac{D+3t}{1-\alpha}n\varepsilon_n^2 \right\}.

Then using the subadditivity of x \mapsto x^\alpha and the definition of the posterior,

\Pi_n(U_n \mid X^{(n)})^\alpha \leq \sum_{\theta \in U_n}\Pi_n(\{\theta\}\mid X^{(n)})^\alpha = \frac{\sum_{\theta \in U_n} e^{-\alpha r_n(\theta, \theta_0)}\Pi_n(\{\theta\})^\alpha}{\left(\sum_{\theta \in \Theta}e^{- r_n(\theta, \theta_0)}\right)^\alpha}.

Now to bound the numerator, we apply Markov’s inequality and get

\mathbb{P}\left( \sum_{\theta \in U_n} e^{-\alpha r_n(\theta, \theta_0)}\Pi_n(\{\theta\})^\alpha \geq \varepsilon \right) \leq \varepsilon^{-1}\sum_{\theta \in U_n} \mathbb{E}[e^{-\alpha r_n(\theta, \theta_0)}]\Pi_n(\{\theta\})^\alpha.

Using the fact that \mathbb{E}[e^{-\alpha r_n(\theta, \theta_0)}] = e^{-(1-\alpha)D^{(n)}_\alpha(\theta, \theta_0)} is bounded by e^{-(D+3t)n\varepsilon_n^2} over \theta \in U_n and letting \varepsilon = H_\alpha(\Pi_n) e^{-(D+2t)n\varepsilon_n^2}, this shows that the above is also bounded by

\varepsilon^{-1}H_\alpha(\Pi_n)e^{-(D+3t) n\varepsilon_n^2} = e^{-tn\varepsilon_n^2} \leq \frac{1}{(D-1+t)^2n\varepsilon_n^2}.

For the denominator, the Lemma 8.1 of Ghosal & al. (2000) yields

\mathbb{P}\left( \left\{\sum_{\theta \in \Theta}e^{- r_n(\theta, \theta_0)}\right\}^\alpha \leq e^{-\alpha(D+t)n\varepsilon_n^2} \right)\leq \frac{1}{(D-1+t)^2n\varepsilon_n^2}.

Putting the bounds on the numerator and denominator together yields the result. \Box

Leave a comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s