This post continues the series on posterior concentration under misspecification. Here I introduce an unifying point of view on the subject through the introduction of the separation -entropy. We use this notion of prior entropy to bridge the gap between Bayesian fractional posteriors and regular posterior distributions: in the case where this entropy is finite, direct analogues to some of the concentration results for fractional posteriors (Bhattacharya et al., 2019) are recovered.
This post is going to be quite abstract, just like last week. I’ll talk in a future post about how this separation -entropy generalizes generalizes the covering numbers for testing under misspecification of Kleijn et al. (2006) as well as the prior summability conditions of De Blasi et al. (2013).
Quick word of warning: this is not the definitive version of the results I’m working on, but I still had to get them out somewhere.
Another word of warning: WordPress has gotten significantly worse at dealing with math recently. I will find a new platform, but for now expect to find typos and some rendering issues.
We continue in the same theoretical framework as before: is a set of densities on a complete and separable metric space with respect to a -finite measure defined on the Borel -algebra of , is the Hellinger distance defined by
and we make use of the Rényi divergences defined by
Here we assume that data is generated following a distribution having a density in our model (this assumption could be weakened), and therefore defined the off-centered Rényi divergence
assuming that all this is well defined.
Prior and posterior distributions
Now let be a prior on . Given either a single data point or a sequence of independent variables with common probability density function , the posterior distribution of given is the random quantity defined by
and . This may not always be well-defined, but I don’t want to get into technicalities for now.
We state our concentration results in terms of the separation -entropy. It is inspired by the Hausdorff -entropy introduced in Xing et al. (2009), although the separation -entropy has no relationship with the Hausdorff measure and instead builds upon the concept of -separation of Choi et al. (2008) defined below.
Given a set , we denote by the convex hull of : it is the set of all densities of the form where is a probability measure on .
Let be fixed as above. A set of densities is said to be -separated from with respect to the divergence if for every ,
A collection of sets is said to be -separated from if every is -separated from .
An important property of -separation, first noted by Walker (2004) and used for the study of posterior consistency, is that it scales with product densities. The general statement of the result is stated in the following lemma.
Lemma (Separation of product densities).
Let , , be a sequence of -finite measured spaces where each is a complete and separable locally compact metric space and is the corresponding Borel -algebra. Denote by the set of probability density functions on , fix and let be -separated from with respect to for some . Let where is the product density on defined by . Then is -separated from with respect to where .
We can now define the separation -entropy of a set with parameter as the minimal -entropy of a -separated covering of . When this entropy is finite, we can study the concentration properties of the posterior distribution using simple information-theoretic techniques similar to those used in Bhattacharya (2019) for the study of Bayesian fractional posteriors.
Definition (Separation -entropy).
Fix , and let be a subset of . Recall , and fixed as previously. The separation -entropy of is defined as
where the infimum is taken over all (measurable) families , , satisfying and which are -separated from with respect to the divergence . When no such covering exists we let , and when we define .
When , so that , we drop the indicator and denote , to emphasize the fact.
Proposition (Properties of the separation -entropy).
The separation -entropy of a set is non-negative and if is -separated from with respect to the divergence . Furthermore, if and , then
and if also , then
For a subset with , we have
and, more generally, if for subsets , then
Theorem (Posterior consistency).
Let and let be a sequence of independent random variables with common probability density . Suppose there exists such that
If satisfies for some , then almost surely as .
The condition implies in particular that .
Corollary (Well-specified consistency).
Suppose that is in the Kullback-Leibler support of . If satisfies for some and for some , then almost surely as .
Corollary (Well-specified Hellinger consistency).
Suppose that is in the Kullback-Leibler support of and fix . If there exists a covering of by Hellinger balls of diameter at most satisfying for some , then almost surely as .
Following Kleijn et al. (2006) and Bhattacharya et al. (2019), we let
be a Kullback-Leibler type neighborhood of (relatively to ) where the second moment of the log likelihood ratio is also controlled.
Theorem (Posterior concentration bound).
Let and let . For any and we have that
holds with probability at least .
Corollary (Posterior concentration bound, i.i.d. case).
Let and let be a sequence of independent random variables with common probability density . For any and we have that
holds with probability at least .
- Bhattacharya, A., D. Pati, and Y. Yang (2019).Bayesian fractional posteriors.Ann.Statist. 47(1), 39–66.
- Choi, T. and R. V. Ramamoorthi (2008).Remarks on consistency of posterior distributions,Volume Volume 3, pp. 170–186. Beachwood, Ohio, USA: Institute of Mathematical Statistics.
- De Blasi, P. and S. G. Walker (2013). Bayesian asymptotics with misspecified models.StatisticaSinica, 169–187.
- Grünwald, P. and T. van Ommen (2017). Inconsistency of bayesian inference for misspecifiedlinear models, and a proposal for repairing it.Bayesian Anal. 12(4), 1069–1103.
- Kleijn, B. J., A. W. van der Vaart, et al. (2006). Misspecification in infinite-dimensional bayesianstatistics.The Annals of Statistics 34(2), 837–877.
- Ramamoorthi, R. V., K. Sriram, and R. Martin (2015). On posterior concentration in misspec-ified models.Bayesian Anal. 10(4), 759–789.
- Walker, S. (2004). New approaches to Bayesian consistency.Ann. Statist. 32(5), 2028–2043.
- Xing, Y. and B. Ranneby (2009). Sufficient conditions for Bayesian consistency. J. Stat. Plan.Inference 139(7), 2479–2489.