# Bayesian binary classification using partitions of unity

Sunday afternoon project. I think it is possible to get topological garanties for the reconstruction of the classification boundary using the Nash-Tognoli theorem.

# 1. The problem

Points $x_i$, $i=1\dots, N$ are distributed on some space $\mathbb{M}$ and associated a label $\ell_i\in \{0,1\}$. It is assumed that

$\ell_i \sim \text{Ber}(p(x_i)), \qquad (1)$

where $p: \mathbb{M} \rightarrow [0,1]$ is some unknown integrable function. Estimating $p$ from the data $\{(x_i,\ell_i)\,|\, i=1,2,\dots, N\}$ allows us to predict $\ell_{n+1}$ given $x_{n+1}$.

It is usually assumed that $p$ is a binary classification rule, i.e. that $p(\mathbb{M}) \subset \{0,1\}$. Letting $p$ be a probabilistic classification rule, taking values ranging between $0$ and $1$, accounts for class overlapping and provides a form of uncertainty quantification.

## 1.1 Alternative formulation

The problem may be alternatively formulated as one of estimating the densities $f_0$ and $f_1$ which are such that

$\ell_i \sim \frac{\sum_{j=0}^1 \alpha_j f_j(x_i)\delta_j}{\sum_{j=0}^1\alpha_j f_j(x_i)},$

for some $\alpha_j \geq 0$, $\sum \alpha_j = 1$. This is the approach pursued in Battacharya and Dunson (2012, Nonparametric Bayes Classification and Hypothesis Testing on Manifolds). It makes available general theory and standard tools, while also relating the classification problem to testing of difference between groups.

For the purpose of this sunday afternoon project, however, we prefer the simpler model $(1)$. We also let $\mathbb{M} = [0,1]^2$ to facilitate visualizations. The approach considered below would be relatively straightforward to extend to other spaces.

# 2. Bayesian solution

We model $p$ in $(1)$ by linear combinations of basis functions. That is, we assume

$p = \sum_{j=0}^n a_j \phi_j^n$

for some set $\{\phi_j^n : \mathbb{M} \rightarrow \mathbb{R}\,|\, j=0,1,\dots, n\}$ of functions. In order to make the statistical analysis independent of class relabellings, the functions $\phi_j$ must form a partition of unity.

Definition 1.
A partition of unity on a set $\mathbb{M}$ is a finite set of functions $\{\phi_i^n: \mathbb{M} \rightarrow [0,1]\,|\, i=0,1,\dots, n\}$ such that $\sum_{i=0}^n \phi_{i}^n(x) = 1$ for all $x \in \mathbb{M}$.

Here, we consider $\mathbb{M} = [0,1]^2$ and use the partition of unity

$(x^1, x^2) \mapsto B_{j}^n(x^1) B_k^n(x^2),\quad B_j^n(u) = {n \choose j} u^j (1-u)^{n-j},\quad i,j \in \{0,\dots, n\}$

obtained by taking tensor products of the Bernstein polynomial densities. The model for $p$ is then

$p(x) = \sum_{j,k=0}^n a_{j,k} B_j^n(x^1) B_k^n(x^2),\quad a_{j,k} \in [0,1].$

## 2.1 Prior on $p$

A prior on $p$ is obtained by letting

$a_{j,k} \sim^{ind.} \text{Beta}(\alpha, \alpha).$

Here $n$, the degree of the Bernstein polynomials, is assumed to be known. It is not too complicated to have $n$ adapting to the data, letting it follow a prior distribution. This reduces the effiency of the posterior sampling algorithm, however, and can lead to computational difficulties.

## 2.2 Posterior computation

I used a simple independent Metropolis-Hastings algorithm to produce the figures above. The log-likelihood of $p = \sum_{j,k} a_{j,k} B_j^nB_k^n$ given $(x_i, \ell_i)$, $i=1,\dots, N$, is a translate of

$\sum_{i=1}^N\left\{ (1-\ell_i) \log \left(\sum_{j,k=0}^n a_{j,k}B_j^n(x_i^1)B_k^n(x_i^2) \right) + \ell_i\log \left(\sum_{j,k=0}^n (1-a_{j,k})B_j^n(x_i^1)B_k^n(x_i^2) \right)\right\}.$

New coefficients are proposed by

$a_{j,k}' \sim \text{Beta}\left(\alpha + \kappa N_0(j,k), \alpha + \kappa N_1(j,k)\right),$

where $\kappa$ is a tuning parameter and where $N_0$, $N_1$ are data-dependent functions used to give a very rough approximation of the posterior.

Calculations for the images shown here take less than 1-2 minutes in Python.

# 3. Examples

The thick black line is the level set of the posterior mean of $p$ corresponding to the value $1/2$. The transparent grey lines are the preimage of $1/2$ under posterior samples. I hand-picked different values of $n$ without giving it too much thought. The last picture is $n=15$ while the others are $n=8$ or $n = 10$.

Source code available here.

## One thought on “Bayesian binary classification using partitions of unity”

1. […] Bayesian binary classification using partitions of unity […]