My short paper A Note on Reverse Pinsker Inequalities was published in early access in the IEEE Transactions on Information Theory a few days ago (arXiv version here). I thought I would try to explain to non-specialists what this paper is about and why I got interested.
So, I am broadly interested in understanding how we can use data to learn about stuff. The data may be noisy images and we’re trying to figure out what they contain; it could be instances of 3D protein structures and we’re trying to learn about the mechanisms of protein folding; or it could be answers from a survey and we’re trying to understand what’s going on in the general population.
Our hypotheses about how the data could have been generated and how it relates to the thing we want to learn about is encoded in what’s called a statistical model. Typically, this statistical model will also specify an information function (related to the likelihood) which roughly tells us how much “information”, in some sense, a data point brings us about competing hypotheses.
Different characteristics of the information function, such as its mean, its variance or its expected curvature, can tell us about how easily we’ll be able to learn from the data. Some of these characteristics may be difficult to compute, others less so. Information inequalities are about relating together many different characteristics of the information function. Hence we may use the characteristics that we’re able to compute to gain some grasp on the more complex and precise ones.
A number of papers have appeared in the information inequalities literature discussing “reverse Pinsker” inequalities and related problems. This kind of inequality comes up in a variety of fields, including in Bayesian nonparametrics where my main research is. However, a few of these papers were very specific, sometimes requiring unnecessarily hard work and yielding results that are not as good as possible. My short article complements these papers by showing how to get best possible “reverse Pinsker” inequalities in complete generality: it boils down to a neat little formula.
It’s quite a specific issue but, in the end, the goal is to improve the tools that we have at our disposal for the study of the behaviour of statistical learning (and other things!). I often use of these tools, which is why I want to make sure they work as well as possible. :^)