We discuss the approximation properties of sequences of linear operators mapping densities to densities. We give conditions for their convergence, explicit their general form, obtain rates of convergences and generalise the index parameter to obtain nets .
Notations. Let be a compact metric space, equipped with a finite measure defined on its Borel -algebra, and denote by the set of all essentially bounded probability densities on . The set is then a complete separable metric space under the total variation distance proportional to .
In bayesian statistics, it is of interest to specify a probability measure on , representing uncertainty about which distribution of is generating independent observations . The problem is that is usually rather big: by Baire’s category theorem, if is not a finite set of points, then cannot be written as a countable union of finite dimensional subspaces. To help in prior elicitation, that is to help a statistician specify , we may decompose in simpler parts.
Here, I discuss how to obtain a sequence of approximating finite dimensional sieves , such that is dense in . A prior on may then be specified as the countable mixture
where is a prior on for all .
Let me emphasize that the following ideas are elementary. Some may be found, with more or less generality, in analysis and approximation theory textbooks. It is, however, interesting to recollect the facts relevant in statistical applications.
1. The basics
The finite dimensional sieves take the form
where the are densities and the coefficients range through some set which we assume contains the simplex .
The following lemma gives sufficient conditions for to be dense in , with the total variation distance.
Lemma 1. Suppose that there exists a measurable partition of , with , such that:
- for all , , uniformly in ; and
- , uniformly in .
Then, is dense in . More precisely, the linear operator maps densities to densities and is such that for all integrable , and for all continuous , .
Proof: We first show that , for all continuous . The method of proof is well-known.
The fact that is linear and maps densities to densities is easily verified. It follows that monotonous (). Now, let . By hypothesis (2), we can suppose that . Thus for all and by the monotonicity of , . Since is compact, is absolutely continuous and there exists such that . Take sufficienly large so that . We have
The first sum is bounded above by , independently of , and the second sum goes uniformly to . Therefore .
We now show that for all integrable . Let . The space of continuous functions is dense in ; there exists a continuous with . Therefore, . Because maps densities to densities it is of norm and . Thus for sufficiently large so that , we obtain .
Finally, and the preceding implies is dense in . QED.
- On the unit interval with the Lebesgue measure, the densities , with , obviously satisfy the hypotheses of the preceding lemma. Here, is the indicator function of the set .
- The indicator functions above may be replaced by the Bernstein polynomial densities . The conditions of the lemma are also satisfied and the proof is relatively straightforward.
Note that any operator of the form may be decomposed as , where is the histogram operator and . In other words, calculating is the process of reducing to an associated histogram and then smoothing it.
A dual approximation process.
Consider the histogram operator and suppose that . Instead of replacing the densities on the outside of the integral by , as we did before, we may replace the partition of unity inside the integral by the partition of unity . This yields the following histogram operator :
It can also be extended to act on measures, by letting
The preceding is of interest in kernel density estimation: given the empirical distribution of observed data , a possible density estimate is . Binning the data through the integral rather than can reduce the sensitivity of the density estimate to the choice of bins .
1.2 The general form of linear operators.
Let be a sequence of positive linear operators mapping densities to densities and such that . Then for each and , there exists a random variable such that . This is a direct consequence of the Riesz representation theorem for positive functionals on . In particular, if admits a density , then
Note that for general random variables , the function may not be a density.
A sufficient condition so that , uniformly in and for all continuous , is the following:
- for all , , uniformly in , meaning that the random variables satisfy the weak law of large numbers uniformly in .
Indeed, for all and continuous ,
The first mean on the RHS is bounded by any when is sufficiently small. The second mean is bounded by a constant multiple of , which goes to zero as .
Rates of convergence
Let be a continuous density and be a modulus of continuity, for instance
For all , we have
Therefore, for any sequence , a calculation yields
In euclidean space, for example, when and exists, for all , we find
1.3 Introducing other parameters
We may index our operators by general parameters , whenever is a directed set (i.e. for all , there exists such that and ). The sequence then becomes the net . We say that if for all there exists such that implies .
For example, we can consider the tensor product operator acting on the space of product densities, where is ordered as iff and .
These extensions are straightforward; the point is that many cases can be treated under the same formalism.