I summarize and comment part of The Sample Size Required in Importance Sampling (Chatterjee and Diaconis, 2015). One innovative idea is to bound the mean estimation error in terms of the tail behavior of , where and are the importance sampling target and proposal distributions, respectively.
The problem is to evaluate
where is a probability measure on a space and where is measurable. The Monte-Carlo estimate of is
When it is too difficult to sample , for instance, other estimates can be obtained. Suppose that is absolutely continuous with respect to another probability measure , and that the density of with respect to is given by . Another unbiaised estimate of is then
This is the general framework of importance sampling, with the Monte-Carlo estimate recovered by taking . An important question is the following.
How large should be for to be close to ?
An answer is given, under certain conditions, by Chatterjee and Diaconis (2015). Their main result can be interpreted as follows. If and if is concentrated around its expected value , then a sample size of approximately is both necessary and sufficient for to be close to . The exact sample size needed depends on and on the tail behavior of . I state below their theorem with a small modification.
Theorem 1. (Chatterjee and Diaconis, 2015)
As above, let . For any and ,
Conversely, for any and ,
Suppose and that is concentrated around , meaning that for some we have that and are both less than an arbitrary . Then, taking we find
However, if , we obtain
meaning that there can be a high probability that and are not close.
Let , so that . In that case, only takes its expected value . The theorem yields
and no useful bound on .
For the theorem to yield a sharp cutoff, it is necessary that be relatively large and that be highly concentrated around . The first condition is not aimed at in the practice of importance sampling. This difficulty contrasts with the broad claim that “a sample of size approximately is necessary and sufficient for accurate estimation by importance sampling”. The result in conceptually interesting, but I’m not convinced that a sharp cutoff is common.
I consider their example 1.4. Here is the exponential distribution of mean , is the exponential distribution of mean 2, and . Thus . We have , meaning that the theorem yields no useful cutoff. Furthermore, and . Optimizing the bound given by the theorem yields
The figure below shows trajectories of . The shaded area bounds the expected error.
This next figure shows trajectories for the Monte-Carlo estimate of , taking and . Here the theorem yields
Chatterjee, S. and Diaconis, P. The Sample Size Required in Importance Sampling. https://arxiv.org/abs/1511.01437v2