# Approximation

Présentation (20 minutes) au séminaire du 5e.

Je présente le théorème d’approximation de Weierstrass pour les fonctions périodiques, en utilisant une base des polynômes trigonométriques récemment suggérée par Róth et al. (2009). Celle-ci se prête naturellement bien à notre application.

Théorème d’approximation de Weierstrass.
Soit $f : \mathbb{R} \rightarrow \mathbb{R}$ une fonction $2\pi$-périodique. Si $f$ est continue, alors on peut construire des polynômes trigonométriques $f_1, f_2, f_3, \dots$ tels que

$f(x) = \sum_{i=1}^{\infty} f_i(x)$

et tels que la convergence de la série ci-dessus est uniforme.

Ce théorème intervient dans plusieurs domaines: en topologie pour démontrer le théorème du point fixe de Brouwer, en géométrie pour l’inégalité isopérimétrique et en géométrie algébrique pour le théorème de Nash-Tognoli. Il implique que $\{1, \cos(x), \sin(x), \cos(2x), \sin(2x), \dots\}$, en tant que système orthonormal, est complèt dans $L^2(\mathbb{S}^1)$. Plus généralement, on s’en sert pour ramener un problème sur les fonctions continues à un problème sur les polynômes, où le calcul différentiel et l’algèbre linéaire s’appliquent. Les démonstrations constructives du théorème fournissent de plus des outils permettant d’effectuer la régression ou la reconstruction de courbes et de surfaces.Read More »

# Constructive approximation of compact hypersurfaces

PDF text.

Let $M \subset \mathbb{R}^k$ be a compact manifold of codimension $1$. We show that $M$ can be well approximated by a part of an algebraic manifold.

Theorem 1.
For all $\varepsilon > 0$, there exists $c > 0$ and a polynomial function $P$ defined on $\mathbb{R}^k$ such that $N := P^{-1}(0) \cap (-c,c)^k$ is diffeomorphic to $M$ and

$\sup_{x \in N} \inf_{y \in M} \|x - y\| < \varepsilon.$

This was proved by Seifert in a 1936 german language paper. It was later generalized to the Nash-Tognoli theorem which implies that non-singular real algebraic sets have precisely the same topological invariants as compact manifolds [1].

Here, however, our motivations are more elementary and practical. Our proof of theorem 1 points towards constructive approximation processes and shows the separability of the space of all compact hypersurfaces, under an appropriate topology. This is relevant in statistics and computer graphics, for smooth hypersurface regression and reconstruction. Growing sequences of finite dimensional search spaces of smooth manifolds are required in these applications.

In preparation for our proof, in section 2, we also discuss the Jordan-Brouwer theorem, the orientability of $M$ and that $M = f^{-1}(0)$ for some function $f$ having regular value $0$. We show that these three facts are essentially equivalent, in the sense that any one can be quite easily obtained from another.

# Constructive approximation on compact manifolds

I presented this (pdf, in french) in a short talk for a differential topology course.

I also dabbled with (pdf, in french) the approximation of compact hypersurfaces. I wasn’t able to get a constructive result in time, so I left it as a very rough draft. [I posted a much improved follow up in April.] In the document, I sketch a proof of the following.

Theorem. Let $M$ be a compact hypersuface of $\mathbb{R}^n$. There exists a sequence of polynomials $\{P_n\}$ defined on a compact of $\mathbb{R}^n$ such that for $n$ sufficiently large, $P_n^{-1}(0)$ is a hypersurface and

$\text{dist}(P_n^{-1}(0), M) \rightarrow 0$.

# Linear approximation operators and statistical models

We discuss the approximation properties of sequences of linear operators $T_n$ mapping densities to densities. We give conditions for their convergence, explicit their general form, obtain rates of convergences and generalise the index parameter to obtain nets $\{T_n\}_{n \in N}$.

Notations. Let $(\mathbb{M}, d)$ be a compact metric space, equipped with a finite measure $\mu$ defined on its Borel $\sigma$-algebra, and denote by $\mathcal{F} \subset L^1$ the set of all essentially bounded probability densities on $\mathbb{M}$. The set $\mathcal{F}$ is then a complete separable metric space under the total variation distance proportional to $|| f-g ||_1 = \int |f-g| d\mu$.

In bayesian statistics, it is of interest to specify a probability measure $P$ on $\mathcal{F}$, representing uncertainty about which distribution of $\mathcal{F}$ is generating independent observations $x_i \in \mathbb{M}$. The problem is that $\mathcal{F}$ is usually rather big: by Baire’s category theorem, if $\mathbb{M}$ is not a finite set of points, then $\mathcal{F}$ cannot be written as a countable union of finite dimensional subspaces. To help in prior elicitation, that is to help a statistician specify $P$, we may decompose $\mathcal{F}$ in simpler parts.

Here, I discuss how to obtain a sequence of approximating finite dimensional sieves $\mathcal{S}_n \subset \mathcal{F}$, such that $\cup_n \mathcal{S}_n$ is dense in $\mathcal{F}$. A prior $P$ on $\mathcal{F}$ may then be specified as the countable mixture

$P = \sum _{n \geq 1} \alpha_n P_{\mathcal{S}_n}, \quad \alpha_n \geq 0,\, \sum_n \alpha_n = 1,$

where $P_{\mathcal{S}_n}$ is a prior on $\mathcal{S}_n$ for all $n$.

Let me emphasize that the following ideas are elementary.  Some may be found, with more or less generality, in analysis and approximation theory textbooks. It is, however, interesting to recollect the facts relevant in statistical applications.

# 1. The basics

The finite dimensional sieves $\mathcal{S}_n$ take the form

$\mathcal{S}_n = \left\{ \sum_{i=0}^{m_n} c_i \phi_{i,n} \right\}, \quad m_n \in \mathbb{N}$

where the $\phi_{i,n}$ are densities and the coefficients $c_i$ range through some set which we assume contains the simplex $\Delta_n = \left\{ (c_i) : \sum c_i = 1,\, c_i \geq 0 \right\}$.

The following lemma gives sufficient conditions for $\cup_n \mathcal{S}_n$ to be dense in $\mathcal{F}$, with the total variation distance.

Lemma 1. Suppose that there exists a measurable partition $\{R_{i,n}\} _{i=0}^{m_n}$ of $\mathbb{M}$, with $\max_i \text{diam}(R_{i,n}) \rightarrow 0$, such that:

1. for all $\delta > 0$, $\sum_{i: d(x, R_{i,n}) > \delta} \mu(R_{i,n}) \phi_{i,n}(x) \rightarrow 0$, uniformly in $x$; and
2. $\sum_i \mu(R_{i,n}) \phi_{i,n} (x) \rightarrow 1$, uniformly in $x$.

Then, $\cup_n \mathcal{S}_n$ is dense in $(\mathbb{F},||\cdot||_1)$. More precisely, the linear operator $T_n : f \mapsto \sum_i \int_{R_{i,n}}f d\mu \, \phi_{i,n}$ maps densities to densities and is such that for all integrable $h$, $||T_n h - h||_1 \rightarrow 0$ and for all continuous $g$, $||T_n g - g||_\infty \rightarrow 0$.

Proof: We first show that $||T_n g - g||_\infty \rightarrow 0$, for all continuous $g$. The method of proof is well-known.

The fact that $T_n$ is linear and maps densities to densities is easily verified. It follows that $T_n$ monotonous ($f < h \Rightarrow T_n f < T_n h$). Now, let $\varepsilon > 0$. By hypothesis (2), we can suppose that $T_n 1 = 1$. Thus for all $x$ and by the monotonicity of $T_n$, $|T_n g\, (x) -g(x)| = |T_n(g-g(x))\, (x)| \le T_n|g-g(x)|\,(x)$. Since $\mathbb{M}$ is compact, $g$ is absolutely continuous and there exists $\delta > 0$ such that $d(x, t) < \delta \Rightarrow |g(x)-g(t)| < \varepsilon$. Take $n$ sufficienly large so that $\max_i \text{diam}(R_{i,n}) < \delta / 2$. We have

$T_n|g-g(x)|\,(x) = \sum_{i : d(x, R_{i,n}) < \delta/2} \int_{R_{i,n}} |g(t) - g(x)|\mu(dt) \phi_{i,n}(x) + \sum_{i : d(x, R_{i,n} \geq \delta/2} \int_{R_{i,n}} |g(t) - g(x)|\mu(dt) \phi_{i,n}(x).$

The first sum is bounded above by $\varepsilon$, independently of $x$, and the second sum goes uniformly to $0$. Therefore $||T_n g - g||_\infty \rightarrow 0$.

We now show that $||T_n h - h||_1 \rightarrow 0$ for all integrable $h$. Let $\varepsilon > 0$. The space of continuous functions is dense in $L^1$; there exists a continuous $g$ with $||g-h||_1 < \varepsilon$. Therefore, $||T_n h - h||_1 \le ||T_n h -T_n g||_1 + ||T_ng - g||_1 + ||g-h||_1$. Because $T_n$ maps densities to densities it is of norm $1$ and $||T_n h - T_ng||_1 \le ||h-g||_1$. Thus for $n$ sufficiently large so that $||T_n g - g||_\infty < \varepsilon / \mu(\mathbb{M})$, we obtain ${}||T_n h - h||_1 \le 3\varepsilon$.

Finally, $T_n(\mathcal{F}) \subset \mathcal{S}_n$ and the preceding implies $\cup_n T_n(\mathcal{F})$ is dense in $\mathcal{F}$. QED.

## 1.1 Examples.

1. On the unit interval $[0,1]$ with the Lebesgue measure, the densities ${}\phi_{i,n} = (n+1)\mathbb{I}_{\left[\frac{i}{n+1}, \frac{i+1}{n+1}\right)}$, with ${}\phi_{n,n} = (n+1)\mathbb{I}_{\left[\frac{n}{n+1}, 1 \right]}$, obviously satisfy the hypotheses of the preceding lemma. Here, $\mathbb{I}_A$ is the indicator function of the set $A$.
2. The indicator functions above may be replaced by the Bernstein polynomial densities $\phi_{i,n}(x) = (n+1){n \choose i}x^i(1-x)^{n-i}$. The conditions of the lemma are also satisfied and the proof is relatively straightforward.

Note that any operator of the form $T_n : f \mapsto \sum_i \int_{R_{i,n}} f\, d\mu \phi_{i,n}$ may be decomposed as $T_n = S_n \circ H_n$, where $H_n$ is the histogram operator $H_n f = \sum_i \int_{R_{i,n}} f d\mu\, \mu(R_{i,n})^{-1}\mathbb{I}_{R_{i,n}}$ and $S_n\left( \sum_i c_i \, \mu(R_{i,n})^{-1}\mathbb{I}_{R_{i,n}} \right) = \sum_i c_i \phi_{i,n}$. In other words, calculating $T_n f$ is the process of reducing $f$ to an associated histogram and then smoothing it.

### A dual approximation process.

Consider the histogram operator $H_n : f \mapsto \sum_i \int \mathbb{I}_{R_{i,n}} f d\mu \,\mu(R_{i,n})^{-1}\mathbb{I}_{R_{i,n}}$ and suppose that $\sum_i \mu(R_{i,n})\phi_{i,n} = 1$. Instead of replacing the densities $\mu(R_{i,n})^{-1}\mathbb{I}_{R_{i,n}}$ on the outside of the integral by $\phi_{i,n}$, as we did before, we may replace the partition of unity ${}\mathbb{I}_{R_{i,n}}$ inside the integral by the partition of unity ${}\mu(R_{i,n})\phi_{i,n}$. This yields the following histogram operator $\tilde{H}_n$:

$\tilde{H}_n f = \sum_i \int f \phi_{i,n} d\mu \, \mathbb{I}_{R_{i,n}}.$

It can also be extended to act on measures, by letting ${}\tilde{H}_n \lambda = \sum_i \int \phi_{i,n} d\lambda \, \mathbb{I}_{R_{i,n}}.$

The preceding is of interest in kernel density estimation: given the empirical distribution $\lambda_n = \frac{1}{n} \sum_{i=1}^{n} \delta_{x_i}$ of observed data $(x_i)$, a possible density estimate is $\tilde{H}_n \lambda_n$. Binning the data through the integral $\int \phi_{i,n} d\lambda_n$ rather than $\int_{R_{i,n}} d\lambda_n = \#\{x_i | x_i \in R_{i,n}\}/n$ can reduce the sensitivity of the density estimate to the choice of bins $R_{i,n}$.

## 1.2 The general form of linear operators.

Let $T_n: L^1 \rightarrow L^1$ be a sequence of positive linear operators mapping densities to densities and such that $T_n 1 = 1$. Then for each $x$ and $n$, there exists a random variable $Y_n(x)$ such that $T_n f (x) = \text{E}[f(Y_n(x))]$. This is a direct consequence of the Riesz representation theorem for positive functionals on $\mathcal{C}(\mathbb{M})$. In particular, if $Y_n(x)$ admits a density $K_n(x, \cdot)$, then

$T_n f (x) = \int f(t) K_n(x,t) \mu(dt).$

Note that for general random variables $Y_n(x)$, the function $x \mapsto \text{E}[f(Y_n(x))]$ may not be a density.

A sufficient condition so that $\text{E}[f(Y_n(x))] \rightarrow f(x)$, uniformly in $x$ and for all continuous $g$, is the following:

• for all $\delta > 0$, $P(|Y_n(x) - x| > \delta) \rightarrow 0$, uniformly in $x$, meaning that the random variables $Y_n(x)$ satisfy the weak law of large numbers uniformly in $x$.

Indeed, for all $\delta > 0$ and continuous $f$,

$\text{E}[|f(Y_n(x)) - f(x)|] = \text{E}[|f(Y_n(x)) - f(x)| \mathbb{I}(|Y_n(x) - x| < \delta)] + \text{E}[|f(Y_n(x)) - f(x)| \mathbb{I}(|Y_n(x) - x| \geq \delta)].$

The first mean on the RHS is bounded by any $\varepsilon > 0$ when $\delta$ is sufficiently small. The second mean is bounded by a constant multiple of $P(|Y_n(x) - x| > \delta)$, which goes to zero as $n\rightarrow \infty$.

### Rates of convergence

Let $f$ be a continuous density and $w_f: \mathbb{R}_{>0} \rightarrow \mathbb{R}_{\geq 0}$ be a modulus of continuity, for instance

$w_f(\delta) = \sup_{d(x,y) < \delta} |f(x) - f(y)|.$

For all $\delta > 0$, we have

$w_f(d(x, t)) \le w_f(\delta)(1+\delta^{-1}d(x,t)).$

Therefore, for any sequence $\delta_n > 0$, a calculation yields

${}|\text{E}[f(Y_n(x))] - f(x)| \le w_f(\delta_n) \left( 2 + \text{E}\left[ \frac{d(Y_n(x), x)}{\delta_n} \mathbb{I}\left(d(Y_n(x), x) \geq \delta_n\right) \right] \right).$

In euclidean space, for example, when $\text{E}[Y_n(x)] = x$ and $\text{Var}(Y_n(x))$ exists, $\sigma_n^2 \geq\text{Var}(Y_n(x))$ for all $x$,  we find

$||\text{E}[f(Y_n(\cdot))] - f||_\infty = \mathcal{O}(w_f(\sigma_n)).$

## 1.3 Introducing other parameters

We may index our operators by general parameters $\theta \in \Theta$, whenever $\Theta$ is a directed set (i.e. for all $\theta_1, \theta_2 \in \Theta$, there exists $\theta_3 \in \Theta$ such that $\theta_1 \le \theta_3$ and $\theta_2 \le \theta_3$). The sequence $\{T_n\}_{n \in \mathbb{N}}$ then becomes the net $\{T_\theta\}_{\theta \in \Theta}$. We say that $\lim ||T_\theta f - f||_\infty = 0$ if for all $\varepsilon > 0$ there exists $\theta_\varepsilon$ such that $\theta \geq \theta_\varepsilon$ implies $||T_\theta f - f||_\infty < \varepsilon$.

For example, we can consider the tensor product operator $T_{n,m} = T_n \otimes T_m$ acting on the space of product densities, where $\mathbb{N} \times \mathbb{N}$ is ordered as $(n,m) \le (n', m')$ iff $n\le n'$ and $m \le m'$.

These extensions are straightforward; the point is that many cases can be treated under the same formalism.