What this blog is about

My professional page is at olivierbinette.ca.

Bayesian theory and exposition

Problem solving, short proofs and notes

Surface regression and reconstruction from a topological perspective

Advertisements

Counting cells in microscopic images

The Statistical Society of Canada has posted a few weeks ago its Case Studies (a grad data science competition) for the 2019 annual meeting held in Calgary on May 26 to 29. One of the case study is about counting cells in microscopic images which look like this:

Unfortunately, the organizers forgot to remove from the test set of images the actual cell counts.

Ok, that’s not quite fair. Truth is that they tried to remove the true cell counts, but didn’t quite manage to do so.

So here’s what’s going on. The file names of the images in the training set take forms such as A01_C1_F1_s01_w2, and the number following the letter “C” in the name indicates the true cell count in the image. While they removed this number, they forgot the remove the number following the letter “A”, which is in a simple bijection with the true cell count… The file names in the test set look like this: A01_F1_s01_w1.

Now even if that number following the letter “A” was removed, there would still be other problems: the number following the letter “s” in the file name also carries quite a bit of information… I don’t know why they left all that in.

I’ve contacted the organizers about this, but they don’t see it as being an important problem for the competition, even when 60% of the team’s scoring will be based on a RMSE prediction score.

Another fun fact about this case study: it is possible to get a root mean square error (RMSE) of about 1-2 cells through linear regression with only one covariate. Try to guess what predictor I used (hint: it’s roughly invariant under the type of blurring that they applied to some of the images.)

3D Data Visualization with WebGL/three.js

I wanted to make a web tool for high-dimensional data exploration through spherical multidimensional scaling (S-MDS). The basic idea of S-MDS is to map a possibly high-dimensional dataset on the sphere while approximately preserving a matrix of pairwise distances (or divergences). An interactive visualization tool could help explore the mapped dataset and translate observations back to the original data domain. I’m not quite finished, but I made a frontend prototype. The next step would be to implement the multidimensional scaling algorithm in Javascript. I may get to this if I find the time.

In the current applet, you can visualize the positions and depths of earthquakes of magnitude greater than 6 from January 1st 2014 up to January 1st 2019. Data is from the US Geological Survey (usgs.gov). Code is on GitHub.

olivierbinette.ca/earthquakes

Fractional Posteriors and Hausdorff alpha-entropy

Bhattacharya, Pati & Yan (2016) wrote an interesting paper on Bayesian fractional posteriors. These are based on fractional likelihoods – likelihoods raised to a fractional power – and provide robustness to misspecification. One of their results shows that fractional posterior contraction can be obtained as only a function of prior mass attributed to neighborhoods, in a sort of Kullback-Leibler sense, of the parameter corresponding to the true data generating distribution (or the one closest to it in the Kullback-Leibler sense). With regular posteriors, on the other hand, a complexity constraint on the prior distribution is usually also required in order to show posterior contraction.

Their result made me think of the approach of Xing & Ranneby (2008) to posterior consistency. Therein, a prior complexity constraint specified through the so-called Hausdorff \alpha-entropy is used to allow bounding the regular posterior distribution by something that is similar to a fractional posterior distribution. As it turns out, the proof of Theorem 3.2 of of Battacharya & al. (2016) can almost directly be adapted to regular posteriors in certain cases, using the Hausdorff \alpha-entropy to bridge the gap. Let me explain this in some more detail.

Le me consider well-specified discrete priors for simplicity. More generally, the discretization trick could possibly yield similar results for non-discrete priors.

I will follow as closely as possible the notations of Battacharya & al. (2016). Let \{p_{\theta}^{(n)} \mid \theta \in \Theta\} be a dominated statistical model, where \Theta = \{\theta_1, \theta_2, \theta_3, \dots\} is discrete. Assume X^{(n)} \sim p_{\theta_0}^{(n)} for some \theta_0 \in \Theta, let

B_n(\varepsilon, \theta_0) = \left\{ \int p_{\theta_0}^{(n)}\log\frac{p_{\theta_0}^{(n)}}{p_{\theta}^{(n)}} < n\varepsilon^2,\, \int p_{\theta_0}^{(n)}\log^2\frac{p_{\theta_0}^{(n)}}{p_{\theta}^{(n)}} < n\varepsilon^2 \right\}

and define the Renyi divergence of order \alpha as

D^{(n)}_{\alpha}(\theta, \theta_0) = \frac{1}{\alpha-1}\log\int\{p_{\theta}^{(n)}\}^\alpha \{p_{\theta_0}^{(n)}\}^{1-\alpha}.

We let \Pi_n be a prior on \Theta and its fractional posterior distribution of order \alpha is defined as

\Pi_{n, \alpha}(A \mid X^{(n)}) \propto \int_{A}p_{\theta}^{(n)}\left(X^{(n)}\right)^\alpha\Pi_n(d\theta)

In this well-specified case, one of their result is the following:

Theorem 3.2 of Bhattacharya & al. (particular case)
Fix \alpha \in (0,1) and assume that \varepsilon_n satisfies n\varepsilon_n^2 \geq 2 and

\Pi_n(B_n(\varepsilon_n, \theta_0)) \geq e^{-n\varepsilon_n^2}.

Then, for any D \geq 2 and t > 0,

\Pi_{n,\alpha}\left( \frac{1}{n}D_\alpha^{(n)}(\theta, \theta_0) \geq \frac{D+3t}{1-\alpha} \varepsilon_n^2 \mid X^{(n)} \right) \leq e^{-tn\varepsilon_n^2}.

holds with probability at least 1-2/\{(D-1+t)^2n\varepsilon_n^2\}.

What about regular posteriors?

Let us define the \alpha-entropy of the prior \Pi_n as

H_\alpha(\Pi_n) = \sum_{\theta \in \Theta} \Pi_n(\theta)^\alpha.

An adaptation of the proof of the previous Theorem, in our case where \Pi_n is discrete, yields the following.

Proposition (Regular posteriors)
Fix \alpha \in (0,1) and assume that \varepsilon_n satisfies n\varepsilon_n^2 \geq 2 and

\Pi_n(B_n(\varepsilon_n, \theta_0)) \geq e^{-n\varepsilon_n^2}.

Then, for any D \geq 2 and t > 0,

\Pi_{n}\left( \frac{1}{n}D_\alpha^{(n)}(\theta, \theta_0) \geq \frac{D+3t}{1-\alpha} \varepsilon_n^2 \mid X^{(n)} \right)^\alpha \leq H_\alpha(\Pi_n) e^{-tn\varepsilon_n^2}.

holds with probability at least 1-2/\{(D-1+t)^2n\varepsilon_n^2\}.

Note that H_\alpha(\Pi_n) may be infinite, in which case the upper bound on the tails of \frac{1}{n}D_\alpha^{(n)} is trivial. When the prior is not discrete, my guess is that the complexity term H_\alpha(\Pi_n) should be replaced by a discretization entropy {}H_\alpha(\Pi_n; \varepsilon_n) which is the \alpha-entropy of a discretized version of \Pi_n whose resolution (in the Hellinger sense) is some function of \varepsilon_n.

Read More »

Prettier base plots in R

R’s base graphics system is notable for the minimal design of its plots. Basic usage is very simple, although more complex customization capabilities are not user friendly. Hence I wrapped the plot and hist functions to improve their default behavior.

Any argument usually passed to plot or hist can also be passed to the two wrapper functions pretty_plot and pretty_hist. A comparison is shown below; “prettified” functions are on the right (obviously!).

par(mfcol=c(1,2))
plot(cars); pretty_plot(cars)

cars.pngRead More »