# Some Comparison Inequalities for Off-Centered Rényi Divergences

Divergences between probability distributions $P$, $Q$ where say $P \ll Q$, provide distributional characteristics of the likelihood ratio $\frac{dP}{dQ}(x)$ when $x \sim Q$. This post is about simple properties of what I call “off-centered” divergences, where the concern is about distributional characteristics of $\frac{dP}{dQ}(x)$ in the misspecified case $x \sim Q_0$ when it may be the case that $Q_0 \not = Q$. The need arises from the study of likelihood-based inference in misspecified models (Kleijn and van der Vaart (2006); Bhattacharya et al. (2019)).

So here’s the framework in which we work. Let $\mathcal{X}$ be a complete and separable metric space together with its Borel $\sigma$-algebra $\mathcal{B}_{\mathcal{X}}$ and a $\sigma$-finite measure $\mu$ on $(\mathcal{X}, \mathcal{B}_{\mathcal{X}})$. We denote by $\mathbb{F}$ the set of all probability distributions which are absolutely continuous with respect to $\mu$ and we identify every element $f \in \mathbb{F}$ to (a chosen version of) its probability density function satisfying $f \geq 0$ and necessarily $\int f\, d\mu = 1$. Our basic metric structure on $\mathbb{F}$ is provided by the Hellinger distance

$H(f,g) = \left(\int (\sqrt{f} - \sqrt{g})^2\right)^{1/2}.$

Additionally, we make use of the Rényi divergence of order $\alpha \in (0,1]$ here given by

$d_\alpha(f, g) = -\alpha^{-1}\log A_{\alpha}(f,g),\quad A_\alpha(f, g) = \int_{{g > 0}} f^{\alpha}g^{1-\alpha}\,d\mu,$

where $A_\alpha$(f, g) is refered to as the $\alpha$-affinity between $f$ and $g$. In the case where $\alpha = 0$, we let $d_0$ be the Kullback-Leibler divergence (or relative entropy) defined as

$d_0(f, g) = D(g | f) = \int_{{g >0}} \log(g/f) g\,d\mu.$

Furthermore, we note the following standard inequalities relating together $d_\alpha$ and $H$ for different levels of $\alpha \in (0,1]$ (van Erven (2014); Bhattacharya et al. (2019)):

• $d_{1/2} = \frac{-1}{2}\log(1-H(f,g)^2)$;
• if $0 < \alpha \leq \beta < 1$, then $d_\beta \leq d_\alpha \leq \frac{1-\alpha}{\alpha}\frac{\beta}{1-\beta} d_\beta$;
• if $D(g|f) < \infty$, then $d_\alpha(f, g) \rightarrow d_0(f,g) = D(g|f)$ as $\alpha \rightarrow 0$.

Point (ii) can be improved when $\alpha = 1/2$. In this case, Proposition 3.1 of Zhang (2006) implies that for $\beta \in (0, 1/2]$, $d_{1/2} \geq 2\beta d_\beta$ and for $\beta \in [1/2, 1)$, $d_{1/2} \leq 2\beta d_\beta$.

## 1. Off-centered divergences

Fix $Q_0$ any probability measure on $(\mathcal{X}, \mathcal{B}_{\mathcal{X}})$. We let $f, g \in \mathbb{F}$. In order to study the behaviour of $\frac{f}{g}(X)$ when $X \sim Q_0$, assuming this is well-defined, we consider

$A\alpha^{Q_0}(f, g) = \mathbb{E}{X \sim Q_0}\left[ \left(\frac{f(X)}{g(X)}\right)^{\alpha} \right] = \int_{\{f > 0\}} \left(f/g\right)^{\alpha}\,d Q_0$

and similarly we define the off-centered Rényi divergence

$d_\alpha^{Q_0}(f,g) = -\alpha^{-1}\log\left( A_\alpha^{Q_0}(f,g) \right).$

Finally, we make use of

$d_0^{Q_0}(f,g) = D^{Q_0}(g|f) = \int_{{g > 0}} \log\left(g/f\right)\,d Q_0$

Note that unless we assume $Q_0 \ll \mu$, there is a dependence in the definition of $d_\alpha^{Q_0}$ to the choice of density representatives $f$ and $g$. That is, $f$ and $g$ must be measurable functions that are well-defined pointwise and not only up to $\mu$-equivalence.

Furthermore, typically, $d_\alpha^{Q_0}$ will take negative values. Considering $d_\alpha^{Q_0}(f,g)$ over $f \in \mathcal{P} \subset\mathbb{F}$ where $\mathcal{P}$ is some fixed convex part of $\mathbb{F}$, and if there exists $f \in \mathcal{P}$ such that $D(Q_0|f) < \infty$ (which implies in particular that $Q_0\ll f \ll \mu$), then we can say that $d_\alpha^{Q_0}(f,g)\geq 0$ for every $f \in \mathcal{P}$ if and only if $g \in \arg\min_{h \in \mathcal{P}}D(Q_0| h)$. Sufficiency follows from Kleijn and van der Vaart (2006) while necessity is a consequence of Proposition 2 below.

## 2. Comparison inequalities

Our first inequalities provide results analogous to $d_{\beta} \leq d_\alpha \leq \frac{1-\alpha}{\alpha}\frac{\beta}{1-\beta}d_\beta$ when $0 < \alpha \leq \beta < 1$: the off-centered divergence $d_\alpha^{Q_0}$ is also decreasing in $\alpha$, and the reverse inequality holds up to some modifications.

Proposition 1.
Let $d_\alpha^{Q_0}$ be defined as before in terms of a probability measure $Q_0$ on ${}(\mathcal{X}, \mathcal{B}_{\mathcal{X}})$. For any $0 < \alpha \leq \beta < 1$, we have

$d_{\beta}^{Q_0} \leq d_{\alpha}^{Q_0} \leq \frac{1-\alpha}{\alpha}\frac{\beta}{1-\beta} d_{\beta}^{Q_0} + \frac{\alpha-\beta}{\alpha(1-\beta)} d_1^{Q_0}.$

Proof.
These are straighforward applications of Jensen’s inequality. For the first inequality, since $\beta \geq \alpha$,

$A_\beta^{Q_0}(f, g) = \int_{{f > 0}} \left(\frac{f}{g} \right)^{\beta}\, dQ_0 \geq \left(\int \left(\frac{f}{g}\right)^{\alpha}\,dQ_0 \right)^{\beta / \alpha}.$

Applying the decreasing function $-\beta^{-1} \log(\cdot)$ yields the result. For the second inequality, first assume that $Q_0({f > 0,\, g = 0}) = 0$. Then using the fact that $\frac{1-\alpha}{1-\beta} \geq 1$ we find

Applying the function $-\alpha^{-1}\log(\cdot)$ then yields the result. When $Q_0({f>0,\, g = 0}) > 0$, then both $d_\alpha^{Q_0}$ and $d_\beta^{Q_0}$ are infinite and the inequality also holds. //

The following Proposition shows how $d_\alpha^{Q_0}$-neighborhoods of the form ${f \in \mathbb{F} \mid d_\alpha^{Q_0}(f,g)}$ around $g\in \mathbb{F}$ are related to $d_\alpha$-neighborhoods around $Q_0$. It also provides the converse to the non-negativity result $d_\alpha(f, g) \geq 0$ when $g$ is a point of minimal Kullback-Leibler divergence: when $D(Q_0| f) < D(Q_0|g)$, then necessarily $d_\alpha^{Q_0}(f, g) < 0$.

Proposition 2.
Let $Q_0$ be a probability measure on ${}(\mathcal{X}, \mathcal{B}_{\mathcal{X}})$ that is absolutely continuous with respect to $\mu$ with density $q_0 \in \mathbb{F}$ and let $f, g \in \mathbb{F}$ be such that $Q_0(\{f > 0,\, g = 0\}) = 0$, $g(\{f > 0,\, q_0 = 0\}) = 0$. Then

$d_\alpha(f, Q_0) \geq (1-\alpha) d_\alpha^{Q_0}(f,g) + \alpha d_\alpha(f,g)$

and

$d_\alpha^{Q_0}(f,g) \leq d_0^{Q_0}(f,g).$

Proof.
Applying Jensen’s inequality, we find

Applying the decreasing function $-\alpha^{-1}\log(\cdot)$ then yields the result. For the second inequality, note that

References:

• Bhattacharya, A., D. Pati, and Y. Yang (2019). Bayesian fractional posteriors. The Annals of Statistics. 47(1), 39–66.
• Kleijn, B. J., A. W. van der Vaart, et al. (2006). Misspecification in infinite-dimensional bayesianstatistics. The Annals of Statistics 34(2), 837–877.
• Zhang, T. (2006). Fromε-entropy to kl-entropy: Analysis of minimum information complexitydensity estimation. The Annals of Statistics. 34(5), 2180–2210
• van Erven, T. and P. Harremos (2014). Rényi divergence and kullback-leibler divergence. IEEE Transactions on Information Theory 60(7), 3797–3820.