Sunday afternoon project. I think it is possible to get topological garanties for the reconstruction of the classification boundary using the Nash-Tognoli theorem.
1. The problem
Points , are distributed on some space and associated a label . It is assumed that
where is some unknown integrable function. Estimating from the data allows us to predict given .
It is usually assumed that is a binary classification rule, i.e. that . Letting be a probabilistic classification rule, taking values ranging between and , accounts for class overlapping and provides a form of uncertainty quantification.
1.1 Alternative formulation
The problem may be alternatively formulated as one of estimating the densities and which are such that
for some , . This is the approach pursued in Battacharya and Dunson (2012, Nonparametric Bayes Classification and Hypothesis Testing on Manifolds). It makes available general theory and standard tools, while also relating the classification problem to testing of difference between groups.
For the purpose of this sunday afternoon project, however, we prefer the simpler model . We also let to facilitate visualizations. The approach considered below would be relatively straightforward to extend to other spaces.
2. Bayesian solution
We model in by linear combinations of basis functions. That is, we assume
for some set of functions. In order to make the statistical analysis independent of class relabellings, the functions must form a partition of unity.
A partition of unity on a set is a finite set of functions such that for all .
Here, we consider and use the partition of unity
obtained by taking tensor products of the Bernstein polynomial densities. The model for is then
2.1 Prior on
A prior on is obtained by letting
Here , the degree of the Bernstein polynomials, is assumed to be known. It is not too complicated to have adapting to the data, letting it follow a prior distribution. This reduces the effiency of the posterior sampling algorithm, however, and can lead to computational difficulties.
2.2 Posterior computation
I used a simple independent Metropolis-Hastings algorithm to produce the figures above. The log-likelihood of given , , is a translate of
New coefficients are proposed by
where is a tuning parameter and where , are data-dependent functions used to give a very rough approximation of the posterior.
Calculations for the images shown here take less than 1-2 minutes in Python.
The thick black line is the level set of the posterior mean of corresponding to the value . The transparent grey lines are the preimage of under posterior samples. I hand-picked different values of without giving it too much thought. The last picture is while the others are or .
Source code available here.