Sunday afternoon project. I think it is possible to get topological garanties for the reconstruction of the classification boundary using the Nash-Tognoli theorem.

# 1. The problem

Points , are distributed on some space and associated a label . It is assumed that

where is some unknown integrable function. Estimating from the data allows us to predict given .

It is usually assumed that is a binary classification rule, i.e. that . Letting be a probabilistic classification rule, taking values ranging between and , accounts for class overlapping and provides a form of uncertainty quantification.

## 1.1 Alternative formulation

The problem may be alternatively formulated as one of estimating the densities and which are such that

for some , . This is the approach pursued in Battacharya and Dunson (2012, *Nonparametric Bayes Classification and Hypothesis Testing on Manifolds*). It makes available general theory and standard tools, while also relating the classification problem to testing of difference between groups.

For the purpose of this sunday afternoon project, however, we prefer the simpler model . We also let to facilitate visualizations. The approach considered below would be relatively straightforward to extend to other spaces.

# 2. Bayesian solution

We model in by linear combinations of basis functions. That is, we assume

for some set of functions. In order to make the statistical analysis independent of class relabellings, the functions must form a partition of unity.

**Definition 1.**

*A partition of unity on a set is a finite set of functions such that for all .*

Here, we consider and use the partition of unity

obtained by taking tensor products of the Bernstein polynomial densities. The model for is then

## 2.1 Prior on

A prior on is obtained by letting

Here , the degree of the Bernstein polynomials, is assumed to be known. It is not too complicated to have adapting to the data, letting it follow a prior distribution. This reduces the effiency of the posterior sampling algorithm, however, and can lead to computational difficulties.

## 2.2 Posterior computation

I used a simple independent Metropolis-Hastings algorithm to produce the figures above. The log-likelihood of given , , is a translate of

New coefficients are proposed by

where is a tuning parameter and where , are data-dependent functions used to give a very rough approximation of the posterior.

Calculations for the images shown here take less than 1-2 minutes in Python.

# 3. Examples

The thick black line is the level set of the posterior mean of corresponding to the value . The transparent grey lines are the preimage of under posterior samples. I hand-picked different values of without giving it too much thought. The last picture is while the others are or .

Source code available here.

[…] Bayesian binary classification using partitions of unity […]