Let be three densities and suppose that, , , independently. What happens to the likelihood ratio
Clearly, it depends. If , then
almost surely at an exponential rate. More generally, if is closer to than to , in some sense, we’d expect that . Such a measure of “closeness” of “divergence” between probability distributions is given by the Kullback-Leibler divergence
It can be verified that with equality if and only if , and that
almost surely at an exponential rate. Thus the K.L.-divergence can be used to solve our problem.
Better measures of divergence?
There are other measures of divergence that can determine the asymptotic behavior of the likelihood ratio as in (e.g. the discrete distance). However, in this note, I give conditions under which the K.-L. divergence is, up to topological equivalence, the “best” measure of divergence.
This is of interest in Bayesian nonparametrics. The hypothesis that a density is in the Kullback-Leibler support of a prior on a density space is used to ensure that
does not converge to exponentially fast as and . Under the conditions we specify, our remark implies that the hypothesis that “ is in the Kullback-Leibler support of ” may not be replaced by a weaker one.
Statement of the remark
Some notations. Let the space of all probability measures on some measurable space . If , then both are absolutely continuous with respect to and possess densities such that and . We denote by the ratio of densities , which in fact does not depend on the choice of dominating measure . The likelihood ratio is abbreviated to , depending implicitely on . The Kullback-Leibler divergence between and is
We let be any other function such that iff and such that if , then there exists a with
almost surely as . The topology on a subset induced by such a function is the topology induced by the sets
The remark below shows that any exponential rate of convergence of the likelihood ratio is picked up by the KL divergence. It is rather obvious (albeit a bit technical), but I thought it was worth writing it up properly.
Let , , independently, and let .
- We have that if and only if does not converge more than exponentially fast to (i.e. there exists such that ).
- If is such that for all , then
and the topology on induced by is weaker than the topology of any other function defined as above.
Proof of 1.
Suppose that . Then, since by the strong law of large numbers , we find that
for all .
If , then for all we have
Proof of 2.
Suppose that there exists such that but . Then, there is a such that
But and , which yields the contradiction.
Since iff , this implies that the topology on induced by is weaker than the one induced by .