# What this blog is about

My professional page is at olivierbinette.ca.

# Counting cells in microscopic images

The Statistical Society of Canada has posted a few weeks ago its Case Studies (a grad data science competition) for the 2019 annual meeting held in Calgary on May 26 to 29. One of the case study is about counting cells in microscopic images which look like this:

Unfortunately, the organizers forgot to remove from the test set of images the actual cell counts.

Ok, that’s not quite fair. Truth is that they tried to remove the true cell counts, but didn’t quite manage to do so.

So here’s what’s going on. The file names of the images in the training set take forms such as A01_C1_F1_s01_w2, and the number following the letter “C” in the name indicates the true cell count in the image. While they removed this number, they forgot the remove the number following the letter “A”, which is in a simple bijection with the true cell count… The file names in the test set look like this: A01_F1_s01_w1.

Now even if that number following the letter “A” was removed, there would still be other problems: the number following the letter “s” in the file name also carries quite a bit of information… I don’t know why they left all that in.

I’ve contacted the organizers about this, but they don’t see it as being an important problem for the competition, even when 60% of the team’s scoring will be based on a RMSE prediction score.

Another fun fact about this case study: it is possible to get a root mean square error (RMSE) of about 1-2 cells through linear regression with only one covariate. Try to guess what predictor I used (hint: it’s roughly invariant under the type of blurring that they applied to some of the images.)

# 3D Data Visualization with WebGL/three.js

I wanted to make a web tool for high-dimensional data exploration through spherical multidimensional scaling (S-MDS). The basic idea of S-MDS is to map a possibly high-dimensional dataset on the sphere while approximately preserving a matrix of pairwise distances (or divergences). An interactive visualization tool could help explore the mapped dataset and translate observations back to the original data domain. I’m not quite finished, but I made a frontend prototype. The next step would be to implement the multidimensional scaling algorithm in Javascript. I may get to this if I find the time.

In the current applet, you can visualize the positions and depths of earthquakes of magnitude greater than 6 from January 1st 2014 up to January 1st 2019. Data is from the US Geological Survey (usgs.gov). Code is on GitHub.

# Fractional Posteriors and Hausdorff alpha-entropy

Bhattacharya, Pati & Yan (2016) wrote an interesting paper on Bayesian fractional posteriors. These are based on fractional likelihoods – likelihoods raised to a fractional power – and provide robustness to misspecification. One of their results shows that fractional posterior contraction can be obtained as only a function of prior mass attributed to neighborhoods, in a sort of Kullback-Leibler sense, of the parameter corresponding to the true data generating distribution (or the one closest to it in the Kullback-Leibler sense). With regular posteriors, on the other hand, a complexity constraint on the prior distribution is usually also required in order to show posterior contraction.

Their result made me think of the approach of Xing & Ranneby (2008) to posterior consistency. Therein, a prior complexity constraint specified through the so-called Hausdorff $\alpha$-entropy is used to allow bounding the regular posterior distribution by something that is similar to a fractional posterior distribution. As it turns out, the proof of Theorem 3.2 of of Battacharya & al. (2016) can almost directly be adapted to regular posteriors in certain cases, using the Hausdorff $\alpha$-entropy to bridge the gap. Let me explain this in some more detail.

Le me consider well-specified discrete priors for simplicity. More generally, the discretization trick could possibly yield similar results for non-discrete priors.

I will follow as closely as possible the notations of Battacharya & al. (2016). Let $\{p_{\theta}^{(n)} \mid \theta \in \Theta\}$ be a dominated statistical model, where $\Theta = \{\theta_1, \theta_2, \theta_3, \dots\}$ is discrete. Assume $X^{(n)} \sim p_{\theta_0}^{(n)}$ for some $\theta_0 \in \Theta$, let

$B_n(\varepsilon, \theta_0) = \left\{ \int p_{\theta_0}^{(n)}\log\frac{p_{\theta_0}^{(n)}}{p_{\theta}^{(n)}} < n\varepsilon^2,\, \int p_{\theta_0}^{(n)}\log^2\frac{p_{\theta_0}^{(n)}}{p_{\theta}^{(n)}} < n\varepsilon^2 \right\}$

and define the Renyi divergence of order $\alpha$ as

$D^{(n)}_{\alpha}(\theta, \theta_0) = \frac{1}{\alpha-1}\log\int\{p_{\theta}^{(n)}\}^\alpha \{p_{\theta_0}^{(n)}\}^{1-\alpha}.$

We let $\Pi_n$ be a prior on $\Theta$ and its fractional posterior distribution of order $\alpha$ is defined as

$\Pi_{n, \alpha}(A \mid X^{(n)}) \propto \int_{A}p_{\theta}^{(n)}\left(X^{(n)}\right)^\alpha\Pi_n(d\theta)$

In this well-specified case, one of their result is the following:

Theorem 3.2 of Bhattacharya & al. (particular case)
Fix $\alpha \in (0,1)$ and assume that $\varepsilon_n$ satisfies $n\varepsilon_n^2 \geq 2$ and

$\Pi_n(B_n(\varepsilon_n, \theta_0)) \geq e^{-n\varepsilon_n^2}.$

Then, for any $D \geq 2$ and $t > 0$,

$\Pi_{n,\alpha}\left( \frac{1}{n}D_\alpha^{(n)}(\theta, \theta_0) \geq \frac{D+3t}{1-\alpha} \varepsilon_n^2 \mid X^{(n)} \right) \leq e^{-tn\varepsilon_n^2}.$

holds with probability at least $1-2/\{(D-1+t)^2n\varepsilon_n^2\}$.

Let us define the $\alpha$-entropy of the prior $\Pi_n$ as

$H_\alpha(\Pi_n) = \sum_{\theta \in \Theta} \Pi_n(\theta)^\alpha.$

An adaptation of the proof of the previous Theorem, in our case where $\Pi_n$ is discrete, yields the following.

Proposition (Regular posteriors)
Fix $\alpha \in (0,1)$ and assume that $\varepsilon_n$ satisfies $n\varepsilon_n^2 \geq 2$ and

$\Pi_n(B_n(\varepsilon_n, \theta_0)) \geq e^{-n\varepsilon_n^2}.$

Then, for any $D \geq 2$ and $t > 0$,

$\Pi_{n}\left( \frac{1}{n}D_\alpha^{(n)}(\theta, \theta_0) \geq \frac{D+3t}{1-\alpha} \varepsilon_n^2 \mid X^{(n)} \right)^\alpha \leq H_\alpha(\Pi_n) e^{-tn\varepsilon_n^2}.$

holds with probability at least $1-2/\{(D-1+t)^2n\varepsilon_n^2\}$.

Note that $H_\alpha(\Pi_n)$ may be infinite, in which case the upper bound on the tails of $\frac{1}{n}D_\alpha^{(n)}$ is trivial. When the prior is not discrete, my guess is that the complexity term $H_\alpha(\Pi_n)$ should be replaced by a discretization entropy ${}H_\alpha(\Pi_n; \varepsilon_n)$ which is the $\alpha$-entropy of a discretized version of $\Pi_n$ whose resolution (in the Hellinger sense) is some function of $\varepsilon_n$.

# ISM at the Eureka! science festival

We’ve been hard at work getting ready for the Eureka! science festival held this weekend at the Montreal Science Centre. Come check it out!

At the festival:

# Topology and Machine Learning

Won best poster award at the CSSC.

par(mfcol=c(1,2)) plot(cars); pretty_plot(cars)