Échantillonnage préférentiel

echantillonnage-preferentiel-comparaison

Introduction

Le problème est de calculer I(f) = \int f d\lambda,\lambda est une mesure de probabilité sur un espace X et f : X \rightarrow \mathbb{R} est intégrable. Si \{X_n\} est une suite de variables aléatoires indépendantes et distribuées selon \mu, alors on peut approximer I(f) par

I_n(f) = \frac{1}{n}\sum_{i=1}^n f(X_i),

qui est dit un estimateur de Monte-Carlo.
En pratique, il peut être difficile de générer X_n \sim \lambda. On préférera alors introduire une mesure \mu, avec \lambda absolument continue par rapport à \mu, de sorte que

I_n(f;\mu) = \frac{1}{n}\sum_{i=1}^n f(Y_i) \tfrac{d\lambda}{d\mu}(Y_i), \quad Y_i \sim^{ind.} \mu,

soit une estimée de I(f) plus commode à calculer. Cette technique, dite de l’échantillonnage préférentiel, peut aussi servir à améliorer la qualité de l’estimateur I_n par exemple en réduisant sa variance.Read More »

Statistical aspects of protein structure prediction

The basics

Amino acids are small molecules of the form

amino_acid_1

where R is a side chain called the R-group. There are 20 different amino acids found in proteins, each characterized by its R-group.

Peptides and proteins are chains of amino acids. Proteins are long such chains, whereas peptides and polypeptides are shorter ones. The amino acids are linked together by peptide bonds:

angles

Read More »

Combinatorics of Phylogenetic Trees

The following is based on a weekend project that I also presented as a short talk in an undergraduate combinatorics seminar. The project is self-contained and mostly based on independent work. Ideas and inspiration came from discussions with my teacher and from the introduction of Diaconis and Holmes (1998). Theorem 2 is from Semple and Steel (2003). Tree pictures were produced with Sagemath and Latex.

French pdf.

1. Introduction

A phylogenetic tree is a rooted binary tree with labeled leaves.

tree1

These trees are used in biology to represent the evolutive history of species. The leaves are the identified species, the root is a common anscestor, and branching represents speciation.

An interesting problem is that of reconstructing the phylogenetic tree that best explains the observed biological characterics of a set of species. A naive mathematical formulation of this problem is proposed in section 4, and used to implement a tree reconstruction algorithm.

fig1Read More »

Loomis-Whitney type inequality for quasi-balls?

Consider the problem of estimating the volume of a tumor, given X-ray scans along orthogonal axes. It may be known that the tumor has a somewhat spherical shape. To formalise this idea, let T \subset \mathbb{R}^3 be the tumor, s the area of its surface, m its volume and C = s^3/m^2. From the isoperimetric inequality, we have C \geq 6^2 \pi, with equality iff T is a ball. Correspondingly, we say that T is a quasi-ball if C \approx 6^2 \pi. In reality, C is unknown but its distribution may be determined.

We are now given the areas m_1, m_2 and m_3 of the projections of T along orthogonal axes. From the Loomis-Whitney inequality (or Cauchy-Schwarz in this case), we have the following estimate of the volume m of T.

Theorem.
We have

\max_i \sqrt{\frac{2^3 m_i^3}{C}} \le m \le \sqrt{m_1 m_2 m_3}.

Problem.
Can we find such an estimate of m that is close to sharp when T is close to a ball?

References.
Loomis, L. H.; Whitney, H. An inequality related to the isoperimetric inequality. Bull. Amer. Math. Soc. 55 (1949), no. 10, 961–962. http://projecteuclid.org/euclid.bams/1183514163.