What this blog is about

My professional page is at olivierbinette.ca.

Bayesian theory and exposition

Problem solving, short proofs and notes

Surface regression and reconstruction from a topological perspective

Advertisements

Mathematica

I often experiment using Mathematica when I need to conjecture the convergence rate of a complicated sequence.  These numerical experiments also help me find which quantities I can neglect while still being able to prove the type convergence that I’ m looking for. Today, unfortunately, there’s pretty much nothing I can neglect and I’ll have to deal with a series of series expression for the sequence illustrated below.

Fractional Posteriors and Hausdorff alpha-entropy

Bhattacharya, Pati & Yan (2016) wrote an interesting paper on Bayesian fractional posteriors. These are based on fractional likelihoods – likelihoods raised to a fractional power – and provide robustness to misspecification. One of their results shows that fractional posterior contraction can be obtained as only a function of prior mass attributed to neighborhoods, in a sort of Kullback-Leibler sense, of the parameter corresponding to the true data generating distribution (or the one closest to it in the Kullback-Leibler sense). With regular posteriors, on the other hand, a complexity constraint on the prior distribution is usually also required in order to show posterior contraction.

Their result made me think of the approach of Xing & Ranneby (2008) to posterior consistency. Therein, a prior complexity constraint specified through the so-called Hausdorff \alpha-entropy is used to allow bounding the regular posterior distribution by something that is similar to a fractional posterior distribution. As it turns out, the proof of Theorem 3.2 of of Battacharya & al. (2016) can almost directly be adapted to regular posteriors in certain cases, using the Hausdorff \alpha-entropy to bridge the gap. Let me explain this in some more detail.

Le me consider well-specified discrete priors for simplicity. More generally, the discretization trick could possibly yield similar results for non-discrete priors.

I will follow as closely as possible the notations of Battacharya & al. (2016). Let \{p_{\theta}^{(n)} \mid \theta \in \Theta\} be a dominated statistical model, where \Theta = \{\theta_1, \theta_2, \theta_3, \dots\} is discrete. Assume X^{(n)} \sim p_{\theta_0}^{(n)} for some \theta_0 \in \Theta, let

B_n(\varepsilon, \theta_0) = \left\{ \int p_{\theta_0}^{(n)}\log\frac{p_{\theta_0}^{(n)}}{p_{\theta}^{(n)}} < n\varepsilon^2,\, \int p_{\theta_0}^{(n)}\log^2\frac{p_{\theta_0}^{(n)}}{p_{\theta}^{(n)}} < n\varepsilon^2 \right\}

and define the Renyi divergence of order \alpha as

D^{(n)}_{\alpha}(\theta, \theta_0) = \frac{1}{\alpha-1}\log\int\{p_{\theta}^{(n)}\}^\alpha \{p_{\theta_0}^{(n)}\}^{1-\alpha}.

We let \Pi_n be a prior on \Theta and its fractional posterior distribution of order \alpha is defined as

\Pi_{n, \alpha}(A \mid X^{(n)}) \propto \int_{A}p_{\theta}^{(n)}\left(X^{(n)}\right)^\alpha\Pi_n(d\theta)

In this well-specified case, one of their result is the following:

Theorem 3.2 of Bhattacharya & al. (particular case)
Fix \alpha \in (0,1) and assume that \varepsilon_n satisfies n\varepsilon_n^2 \geq 2 and

\Pi_n(B_n(\varepsilon_n, \theta_0)) \geq e^{-n\varepsilon_n^2}.

Then, for any D \geq 2 and t > 0,

\Pi_{n,\alpha}\left( \frac{1}{n}D_\alpha^{(n)}(\theta, \theta_0) \geq \frac{D+3t}{1-\alpha} \varepsilon_n^2 \mid X^{(n)} \right) \leq e^{-tn\varepsilon_n^2}.

holds with probability at least 1-2/\{(D-1+t)^2n\varepsilon_n^2\}.

What about regular posteriors?

Let us define the \alpha-entropy of the prior \Pi_n as

H_\alpha(\Pi_n) = \sum_{\theta \in \Theta} \Pi_n(\theta)^\alpha.

An adaptation of the proof of the previous Theorem, in our case where \Pi_n is discrete, yields the following.

Proposition (Regular posteriors)
Fix \alpha \in (0,1) and assume that \varepsilon_n satisfies n\varepsilon_n^2 \geq 2 and

\Pi_n(B_n(\varepsilon_n, \theta_0)) \geq e^{-n\varepsilon_n^2}.

Then, for any D \geq 2 and t > 0,

\Pi_{n}\left( \frac{1}{n}D_\alpha^{(n)}(\theta, \theta_0) \geq \frac{D+3t}{1-\alpha} \varepsilon_n^2 \mid X^{(n)} \right)^\alpha \leq H_\alpha(\Pi_n) e^{-tn\varepsilon_n^2}.

holds with probability at least 1-2/\{(D-1+t)^2n\varepsilon_n^2\}.

Note that H_\alpha(\Pi_n) may be infinite, in which case the upper bound on the tails of \frac{1}{n}D_\alpha^{(n)} is trivial. When the prior is not discrete, my guess is that the complexity term H_\alpha(\Pi_n) should be replaced by a discretization entropy {}H_\alpha(\Pi_n; \varepsilon_n) which is the \alpha-entropy of a discretized version of \Pi_n whose resolution (in the Hellinger sense) is some function of \varepsilon_n.

Read More »

Prettier base plots in R

R’s base graphics system is notable for the minimal design of its plots. Basic usage is very simple, although more complex customization capabilities are not user friendly. Hence I wrapped the plot and hist functions to improve their default behavior.

Any argument usually passed to plot or hist can also be passed to the two wrapper functions pretty_plot and pretty_hist. A comparison is shown below; “prettified” functions are on the right (obviously!).

par(mfcol=c(1,2))
plot(cars); pretty_plot(cars)

cars.pngRead More »