Bayesian binary classification using partitions of unity

Sunday afternoon project. I think it is possible to get topological garanties for the reconstruction of the classification boundary using the Nash-Tognoli theorem.

1. The problem

Points x_i, i=1\dots, N are distributed on some space \mathbb{M} and associated a label \ell_i\in \{0,1\}. It is assumed that

\ell_i \sim \text{Ber}(p(x_i)), \qquad (1)

where p: \mathbb{M} \rightarrow [0,1] is some unknown integrable function. Estimating p from the data \{(x_i,\ell_i)\,|\, i=1,2,\dots, N\} allows us to predict \ell_{n+1} given x_{n+1}.

It is usually assumed that p is a binary classification rule, i.e. that p(\mathbb{M}) \subset \{0,1\}. Letting p be a probabilistic classification rule, taking values ranging between 0 and 1, accounts for class overlapping and provides a form of uncertainty quantification.

1.1 Alternative formulation

The problem may be alternatively formulated as one of estimating the densities f_0 and f_1 which are such that

\ell_i \sim \frac{\sum_{j=0}^1 \alpha_j f_j(x_i)\delta_j}{\sum_{j=0}^1\alpha_j f_j(x_i)},

for some \alpha_j \geq 0, \sum \alpha_j = 1. This is the approach pursued in Battacharya and Dunson (2012, Nonparametric Bayes Classification and Hypothesis Testing on Manifolds). It makes available general theory and standard tools, while also relating the classification problem to testing of difference between groups.

For the purpose of this sunday afternoon project, however, we prefer the simpler model (1). We also let \mathbb{M} = [0,1]^2 to facilitate visualizations. The approach considered below would be relatively straightforward to extend to other spaces.

2. Bayesian solution

We model p in (1) by linear combinations of basis functions. That is, we assume

p = \sum_{j=0}^n a_j \phi_j^n

for some set \{\phi_j^n : \mathbb{M} \rightarrow \mathbb{R}\,|\, j=0,1,\dots, n\} of functions. In order to make the statistical analysis independent of class relabellings, the functions \phi_j must form a partition of unity.

Definition 1.
A partition of unity on a set \mathbb{M} is a finite set of functions \{\phi_i^n: \mathbb{M} \rightarrow [0,1]\,|\, i=0,1,\dots, n\} such that \sum_{i=0}^n \phi_{i}^n(x) = 1 for all x \in \mathbb{M}.

Here, we consider \mathbb{M} = [0,1]^2 and use the partition of unity

(x^1, x^2) \mapsto B_{j}^n(x^1) B_k^n(x^2),\quad B_j^n(u) = {n \choose j} u^j (1-u)^{n-j},\quad i,j \in \{0,\dots, n\}

obtained by taking tensor products of the Bernstein polynomial densities. The model for p is then

p(x) = \sum_{j,k=0}^n a_{j,k} B_j^n(x^1) B_k^n(x^2),\quad a_{j,k} \in [0,1].

2.1 Prior on p

A prior on p is obtained by letting

a_{j,k} \sim^{ind.} \text{Beta}(\alpha, \alpha).

Here n, the degree of the Bernstein polynomials, is assumed to be known. It is not too complicated to have n adapting to the data, letting it follow a prior distribution. This reduces the effiency of the posterior sampling algorithm, however, and can lead to computational difficulties.

2.2 Posterior computation

I used a simple independent Metropolis-Hastings algorithm to produce the figures above. The log-likelihood of p = \sum_{j,k} a_{j,k} B_j^nB_k^n given (x_i, \ell_i), i=1,\dots, N, is a translate of

\sum_{i=1}^N\left\{ (1-\ell_i) \log \left(\sum_{j,k=0}^n a_{j,k}B_j^n(x_i^1)B_k^n(x_i^2) \right) + \ell_i\log \left(\sum_{j,k=0}^n (1-a_{j,k})B_j^n(x_i^1)B_k^n(x_i^2) \right)\right\}.

New coefficients are proposed by

a_{j,k}' \sim \text{Beta}\left(\alpha + \kappa N_0(j,k), \alpha + \kappa N_1(j,k)\right),

where \kappa is a tuning parameter and where N_0, N_1 are data-dependent functions used to give a very rough approximation of the posterior.

Calculations for the images shown here take less than 1-2 minutes in Python.

3. Examples

The thick black line is the level set of the posterior mean of p corresponding to the value 1/2. The transparent grey lines are the preimage of 1/2 under posterior samples. I hand-picked different values of n without giving it too much thought. The last picture is n=15 while the others are n=8 or n = 10.

Source code available here.

One thought on “Bayesian binary classification using partitions of unity

Leave a comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s