Début du chapitre sur LOOCV pour Ridge.

2021-11-02 17:03:04 +01:00 · 2021-11-02 17:03:04 +01:00 · 05b5daf904
commit 05b5daf904
parent 540967c3dd
3 changed files with 52 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -16,3 +16,4 @@
 12_factorisation_qr.pdf
 13_puissance_iteree_par_blocs.pdf
 14_geometrie_ridge_svd.pdf
+15_loocv.pdf
--- a/15_loocv.R
+++ b/15_loocv.R
@ -0,0 +1 @@
+#loocv
--- a/15_loocv.Rmd
+++ b/15_loocv.Rmd
@ -0,0 +1,50 @@
+---
+title: "15 Validation croisée un contre tous"
+output:
+  bookdown::pdf_document2:
+    number_section: yes
+toc: false
+classoption: fleqn
+---
+
+```{r, include=FALSE}
+source("15_loocv.R", local = knitr::knit_global())
+```
+
+La validation croisée un contre tous (LOOCV, Leave One Out Cross Validation) est un cas extrême de la validation croisée à k plis. Pour un jeu de données constitué de N observations, il s'agit de faire une validation croisée à N plis. Nous nous intéressons à estimer par LOOCV l'erreur commise par un modèle de régression ridge pour un choix de la valeur de l'hyper-paramètre $\lambda$.
+
+Nous notons $\mathbf{\hat{\beta}_\lambda}^{(-i)}$ les coefficients de la régression ridge appris en utilisant $N-1$ observations, c'est-à-dire après avoir retiré la paire $(\mathbf{x_i},y_i)$.
+
+\[
+LOO_\lambda = \frac{1}{N} \sum_{i=1}^{N} y_i - \mathbf{x_i}^T \mathbf{\hat{\beta}_\lambda}^{(-i)}
+\]
+
+Nous détaillons l'expression de $\mathbf{\hat{\beta}_\lambda}^{(-i)}$.
+
+\[
+\mathbf{\hat{\beta}_\lambda}^{(-i)} = \left( \mathbf{X^{(-i)}}^T\mathbf{X^{(-i)}} + \lambda\mathbf{I}  \right)^{-1} \mathbf{X^{(-i)}}^T y^{(-i)}
+\]
+
+Nous précisons le sens de $\mathbf{X^{(-i)}}^T\mathbf{X^{(-i)}} + \lambda\mathbf{I}$.
+
+\[
+\mathbf{X^{(-i)}}^T\mathbf{X^{(-i)}} + \lambda\mathbf{I} = \mathbf{X^T}\mathbf{X} + \lambda\mathbf{I} - \mathbf{x_i}\mathbf{x_i}^T
+\]
+
+Il faut remarquer que nous opérons la soustraction de la matrice $\mathbf{x_i}\mathbf{x_i}^T$ (un produit externe) et non du scalaire $\mathbf{x_i}^T\mathbf{x_i}$.
+
+Comme étape intermédiaire, nous rappelons maintenant la définition de la matrice de chapeau (hat matrix) $\mathbf{H}$.
+
+\[
+\hat{y}_\lambda = \mathbf{X}\mathbf{\hat{\beta}_\lambda} = \mathbf{X} \left( \mathbf{X}^T \mathbf{X} + \lambda \mathbf{I}\right)^{-1} \mathbf{X}^T y = \mathbf{H} y
+\]
+
+Remarque : on parle de la matrice de chapeau, ou matrice de projection car elle ajoute un chapeau à $y$.
+
+Nous notons $h_i$ les éléments diagonaux de la matrice $\mathbf{H}$.
+
+\[
+\mathbf{x_i}^T \left( \mathbf{X}^T \mathbf{X} + \lambda \mathbf{I}\right)^{-1} \mathbf{x_i} = \mathbf{H}_{ii} = h_i
+\]
+
+Rappelons également la formule de Morrison qui permet de calculer la mise à jour de l'inverse d'une matrice après l'ajout d'une matrice de rang 1.