# Bayesian learning

Friday july 28 at 17:00
Rutherford Physics Building, Room 118, McGill

Next week, I’ll be talking about Bayesian learning at the Mathematical congress of the americas and at the Canadian undergraduate mathematics conference. These are somewhat challenging talks: I need to sell the idea of Bayesian statistics to a general mathematical audience (which knows nothing about it), interest them in some though problems of Bayesian nonparametrics, and then present some of our research results. This must be done in under 20 minutes.

To make the presentation more intuitive and accessible, I borrowed some language from machine learning. I’m talking about learning rather than inference, uncertain knowledge rather than subjective belief, and “asymptotic correctness” rather than consistency. These are essentially synonymous, although some authors might use them in different ways. This should not cause problems for this introductory talk.Read More »

# Explanations

Angular data arises in many scientific fields, such as in experimental biology for the study of animal orientation, and in bioinformatics in relation to the protein structure prediction problem.

The statistical analysis of this data requires adapted tools such as $2\pi$-periodic density models. Fernandez-Duran (Biometrics, 60(2), 2004) proposed non-negative trigonometric sums (i.e. non-negative trigonometric polynomials) as a flexible family of circular distributions. However, the coefficients of trigonometric polynomials expressed in the standard basis $1, \cos(x), \sin(x), \dots$ are difficult to interpret and we do not see how an informative prior could be specified through this parametrization. Moreover, the use of this basis was criticized by Ferreira et al. (Bayesian Analysis, 3(2), 2008) as resulting in a “wigly approximation, unlikely to be useful in most real applications”.

### Trigonometric density basis

Here, we suggest the use of a density basis of the trigonometric polynomials and argue it is well suited to statistical applications. In particular, coefficients of trigonometric densities expressed in this basis possess an intuitive geometric interpretation. Furthermore, we show how “wiggliness” can be precisely controlled using this basis and how another geometric constraint, periodic unimodality, can be enforced [first proposition on the poster]. To ensure that nothing is lost by using this basis, we also show that the whole model consists of precisely all positive trigonometric densities, together with the basis functions [first theorem on the poster].

### Prior specification

Priors can be specified on the coefficients of mixtures in our basis and on the degree of the trigonometric polynomials to be used. Through the interpretability of the coefficients and the shape-preserving properties of the basis, different types of prior knowledge may be incorporated. Together with an approximate understanding of mass allocation, these include:

• periodic unimodality;
• bounds on total variation; and
• knowledge of the marginal distributions (in the multivariate case).

The priors obtained this way are part of a well-studied family called sieve priors, including the well-known Bernstein-Dirichlet prior, and are finite mixtures with an unknown number of components. Most results and interpretations about the Bernstein-Dirichlet prior (see Petrone & Wasserman (J. R. Stat. Soc. B., 64(1),  2002), Kruijer and Van der Vaart (J. Stat. Plan. Inference, 138(7), 2008), McVinish et al. (Scand. J. Statist., 36(2), 2009) can carry over to the priors we consider, but we dot not discuss them further.

### Approximation-theoric framework

Our density models arise as the image of “shape-perserving” linear approximation operators. This approximation-theoric relationship is used to obtain a notably large prior Kullback-Leibler support and ensures strong posterior consistency at all bounded (not necessarily continuous) density. The result partly relies on known properties of sieve priors, as well as general consistency results (Walker (Ann. Statist., 32(5), 2004)), but extends known result by removing an usual continuity hypothesis on the densities at which consistency is achieved (see Wu & Ghosal (‎Electron. J. Stat., 2, 2008), Petrone & Veronese (Statistica Sinica, 20, 2010)). For contraction rates, higher order smoothness conditions are usually required (see Shen & Ghosal (Scand. J. Statist., 42(4), 2015)).

For example, consider the prior induced by the random density

$T_n \mathcal{D} := \sum_i \mathcal{D}(R_{i,n}) C_{i,n},\qquad (1)$

where $\mathcal{D}$ is a Dirichlet process, $n$ is distributed on $\mathbb{N}$ and $R_{i,n}$ is a partition of the circle. It has the strong posterior consistency at all bounded density provided that the associated operator

$T_n : f \mapsto \sum_i C_{i,n} \int_{R_{i,n}} f$

is such that $\|T_n f - f\|_\infty \rightarrow 0$ for all continuous $f$.

More generally, let $\mathbb{F}$ be a set of bounded densities on some compact metric space $\mathbb{M}$, let $T_n : L^1(\mathbb{M}) \rightarrow L^1(\mathbb{M})$, $n \in \mathbb{N}$, be a sequence of operators that are:

• shape preserving: $T_n$ maps densities to densities and $T_n(\mathbb{F}) \subset \mathbb{F}$; and
• approximating: $\|T_n f - f\|_\infty \rightarrow 0$ for all continuous $f$;

and finally let $\Pi_n$ be priors on $T_n(\mathbb{F})$ with full support. A sieve prior on $\mathbb{F}$ is defined by

$\Pi : A \mapsto \sum_n \rho(n) \Pi_n(A \cap T_n(\mathbb{F}))$.

Theorem.
If $0 < \rho(n) < Ce^{-c d_n}$ for some increasing sequence $d_n$ bounding the dimensions of $T_n (\mathbb{F})$, then the posterior distribution of $\Pi$ is strongly consistent at each density of $\mathbb{F}$.

The approximation theory literature is rich in such operators. The theorem shows that they provide strongly consistent priors on arbitrary density spaces simply given priors $\Pi_n$ on $T_n(\mathbb{F})$.

Basic density estimation:

A thousand samples (grey histogram) were drawn from the density in orange. The prior is defined by (1) with the Dirichlet process centered on the uniform density and with a precision parameter of 2. The degree $n$ is distributed as a $\text{Poiss}(15)$. The blue line is the posterior mean, the dark blue shaded region is a 50% pointwise credible region around the median, and the light blue shaded region is a 90% credible region.

# Approximation

Présentation (20 minutes) au séminaire du 5e.

Je présente le théorème d’approximation de Weierstrass pour les fonctions périodiques, en utilisant une base des polynômes trigonométriques récemment suggérée par Róth et al. (2009). Celle-ci se prête naturellement bien à notre application.

Théorème d’approximation de Weierstrass.
Soit $f : \mathbb{R} \rightarrow \mathbb{R}$ une fonction $2\pi$-périodique. Si $f$ est continue, alors on peut construire des polynômes trigonométriques $f_1, f_2, f_3, \dots$ tels que

$f(x) = \sum_{i=1}^{\infty} f_i(x)$

et tels que la convergence de la série ci-dessus est uniforme.

Ce théorème intervient dans plusieurs domaines: en topologie pour démontrer le théorème du point fixe de Brouwer, en géométrie pour l’inégalité isopérimétrique et en géométrie algébrique pour le théorème de Nash-Tognoli. Il implique que $\{1, \cos(x), \sin(x), \cos(2x), \sin(2x), \dots\}$, en tant que système orthonormal, est complèt dans $L^2(\mathbb{S}^1)$. Plus généralement, on s’en sert pour ramener un problème sur les fonctions continues à un problème sur les polynômes, où le calcul différentiel et l’algèbre linéaire s’appliquent. Les démonstrations constructives du théorème fournissent de plus des outils permettant d’effectuer la régression ou la reconstruction de courbes et de surfaces.Read More »