intro_to_ml/19_nystroem_approximation.Rmd

# Approximation de Nyström

```{r, include=FALSE}
source("01_intro_code.R", local = knitr::knit_global())
source("04_validation_croisee_code.R", local = knitr::knit_global())
source("15_loocv_code.R", local = knitr::knit_global())
source("19_nystroem_approximation_code.R", local = knitr::knit_global())
```

## Approximation de Nyström d'un noyau

Soit un jeu de données.
$$
(\mathbf{x_i},y_i) \quad \text{avec} \quad i=1,\dots,n \quad ; \quad y_i \in \mathbb{R} \quad ; \quad \mathbf{x_i} \in \mathbb{R}^d
$$
Soit un noyau $\mathbf{K}$ (par ex. gaussien).
$$
\mathbf{K} =
\left[
\begin{array}{ccc}
k(\mathbf{x_1},\mathbf{x_1}) & \dots & k(\mathbf{x_1},\mathbf{x_n}) \\
\dots & \dots & \dots \\
k(\mathbf{x_n},\mathbf{x_1}) & \dots & k(\mathbf{x_n},\mathbf{x_n}) \\
\end{array}
\right]
$$

Soit une approximation de rang $m$ de $\mathbf{K}$ (exprimée en fonction de la décomposition en valeurs propres).
$$
\mathbf{K} \approx \mathbf{U}\mathbf{\Lambda}\mathbf{U}^T \quad ; \quad \mathbf{U} \in \mathbb{R}^{n \times m} \quad ; \quad \mathbf{\Lambda} \in \mathbb{R}^{m \times m}
$$

Lorsque le noyau original est utilisé, chacune des $n$ observations du jeu de données d'apprentissage sert de centre à partir duquel mesurer la proximité d'une nouvelle observation. Le principe de l'approximation de rang $m$, selon l'approche dite de Nyström, est de ne conserver que $m$ des $n$ observations comme centres à partir desquels mesurer la proximité de nouvelles observations. Nous sélectionnons ainsi $m$ observations ($m$ lignes de $\mathbf{X}$). Plusieurs approches sont possibles (par ex., de façon aléatoire, par l'algorithme des $k$-médoïdes, etc.).
$$
\mathbf{K} =
\left[
\begin{array}{cc}
\mathbf{K_{11}} & \mathbf{K_{21}^T}  \\
\mathbf{K_{21}} & \mathbf{K_{22}}  \\
\end{array}
\right]
\quad ; \quad
\mathbf{K_{11}} \in \mathbb{R}^{m \times m}
\quad ; \quad
\mathbf{K_{21}} \in \mathbb{R}^{(n-m) \times m}
$$

L'approximation de Nyström construit une approximation de rang $m$ de $\mathbf{K}$ sans évaluer $\mathbf{K_{22}}$. Commençons par exprimer l'approximation de rang $m$ sous forme de matrices blocs.
$$
\mathbf{K} \approx
\left[
\begin{array}{c}
\mathbf{U_1}  \\
\mathbf{U_2}  \\
\end{array}
\right]
\mathbf{\Lambda}
\left[
\begin{array}{c}
\mathbf{U_1}  \\
\mathbf{U_2}  \\
\end{array}
\right]^T
=
\left[
\begin{array}{cc}
\mathbf{U_1}\mathbf{\Lambda}\mathbf{U_1}^T & \mathbf{U_1}\mathbf{\Lambda}\mathbf{U_2}^T \\
\mathbf{U_2}\mathbf{\Lambda}\mathbf{U_1}^T & \mathbf{U_2}\mathbf{\Lambda}\mathbf{U_2}^T \\
\end{array}
\right]
\quad ; \quad
\mathbf{U_1} \in \mathbb{R}^{m \times m}
\quad ; \quad
\mathbf{U_2} \in \mathbb{R}^{(n-m) \times m}
$$

Nous remarquons que $\mathbf{K_{11}} \approx \mathbf{U_1}\mathbf{\Lambda}\mathbf{U_1}^T$ et $\mathbf{K_{21}} \approx \mathbf{U_2}\mathbf{\Lambda}\mathbf{U_1}^T$.

Nous souhaitons éviter d'avoir à évaluer $\mathbf{U_2}$ dans l'approximation de $\mathbf{K_{22}}$. Nous cherchons donc une expression de $\mathbf{U_2}$ en fonction d'autres termes.
$$
\begin{aligned}
 & \mathbf{K_{21}} \approx \mathbf{U_2}\mathbf{\Lambda}\mathbf{U_1}^T \\
= \{& \text{Multiplication des deux membres de l'égalité par un même terme non nul.} \} \\
 & \mathbf{K_{21}} \left( \mathbf{\Lambda}\mathbf{U_1}^T \right)^{-1} \approx \mathbf{U_2}\mathbf{\Lambda}\mathbf{U_1}^T \left( \mathbf{\Lambda}\mathbf{U_1}^T \right)^{-1} \\
= \{& \text{Algèbre linéaire. $\mathbf{U_1}$ est orthogonale donc $\mathbf{U_1}^{-1}=\mathbf{U_1}^T$} \} \\
 & \mathbf{K_{21}} \mathbf{U_1} \mathbf{\Lambda}^{-1} \approx \mathbf{U_2}
\end{aligned}
$$

Nous pouvons maintenant découvrir une approximation de $\mathbf{K_{22}}$ en fonction de celles de $\mathbf{K_{21}}$ et $\mathbf{K_{11}}$.
$$
\begin{aligned}
 & \mathbf{K_{22}} \\
\approx \{& \text{Définition de la décomposition en valeurs propres pour l'approximation de rang $m$ de $\mathbf{K}$} \} \\
 & \mathbf{U_2}\mathbf{\Lambda}\mathbf{U_2}^T \\
= \{& \text{Expression de $\mathbf{U_2}$ découverte ci-dessus.} \} \\
 & \left(\mathbf{K_{21}} \mathbf{U_1} \mathbf{\Lambda}^{-1}\right)\mathbf{\Lambda}\left(\mathbf{K_{21}} \mathbf{U_1} \mathbf{\Lambda}^{-1}\right)^T \\
= \{& \text{Algèbre linéaire.} \} \\
 & \mathbf{K_{21}} \mathbf{U_1} \mathbf{\Lambda}^{-1} \mathbf{U_1}^T \mathbf{K_{21}}^T \\
= \{& \text{Définition de $\mathbf{K_{11}}$ dans la décomposition en valeurs propres pour l'approximation de rang $m$ de $\mathbf{K}$} \} \\
 & \mathbf{K_{21}} \mathbf{K_{11}}^{-1} \mathbf{K_{21}}^T \\
= \{& \text{Algèbre linéaire.} \} \\
 & \left(\mathbf{K_{21}} \mathbf{K_{11}}^{-1/2}\right) \left(\mathbf{K_{21}} \mathbf{K_{11}}^{-1/2}\right)^T
\end{aligned}
$$

Ainsi, nous pouvons introduire une matrice $\boldsymbol\Phi$ de rang $m$ telle que $\mathbf{K} \approx \boldsymbol\Phi \boldsymbol\Phi^T$.
$$
\boldsymbol\Phi =
\left[
\begin{array}{c}
\mathbf{K_{11}}^{1/2}  \\
\mathbf{K_{21}} \mathbf{K_{11}}^{-1/2}  \\
\end{array}
\right]
$$

## Calcul de l'approximation de Nyström

Notons $\mathbf{C}$ la sélection des $m$ colonnes de $\mathbf{K}$.
$$
\mathbf{C} =
\left[
\begin{array}{c}
\mathbf{K_{11}}  \\
\mathbf{K_{21}}  \\
\end{array}
\right]
$$

Nous avons :
$$
\mathbf{C} \mathbf{K_{11}}^{-1} \mathbf{C}^T =
\left[
\begin{array}{c}
\mathbf{I}  \\
\mathbf{K_{21}} \mathbf{K_{11}}^{-1}  \\
\end{array}
\right]
\left[
\begin{array}{cc}
\mathbf{K_{11}}^T & \mathbf{K_{21}}^T  \\
\end{array}
\right]
=
\left[
\begin{array}{cc}
\mathbf{K_{11}}^T & \mathbf{K_{21}}^T  \\
\mathbf{K_{21}} & \mathbf{K_{21}} \mathbf{K_{11}}^{-1} \mathbf{K_{21}}^T \\
\end{array}
\right]
\approx \mathbf{K}
$$

Nous pouvons donc calculer une approximation de rang $m$ de $\mathbf{K} \in \mathbb{R}^{n \times n}$ en ne calculant que $\mathbf{C} \in \mathbb{R}^{n \times m}$ (c'est-à-dire, $m$ colonnes de $\mathbf{K}$) et la décomposition en valeurs propres de $\mathbf{K_{11}} \in \mathbb{R}^{m \times m}$.
$$
\begin{aligned}
 & \mathbf{K} \approx \mathbf{C} \mathbf{K_{11}}^{-1} \mathbf{C}^T \\
= \{& \mathbf{K_{11}} = \mathbf{U}\boldsymbol\Sigma\mathbf{U}^T \} \\
 & \mathbf{K} \approx \mathbf{C} \mathbf{U}\boldsymbol\Sigma^{-1}\mathbf{U}^T \mathbf{C}^T \\
= \{& \mathbf{L} \triangleq \mathbf{C} \mathbf{U}\boldsymbol\Sigma^{-1/2} \} \\
 & \mathbf{K} \approx \mathbf{L}\mathbf{L}^T
\end{aligned}
$$

Nous voyons que la matrice $\mathbf{L}$ correspond exactement à la matrice $\boldsymbol\Phi$ dont les lignes sont les nouvelles variables $m$-dimensionnelles telles que $\boldsymbol\Phi \boldsymbol\Phi^T$ soit une approximation de rang-$m$ de $\mathbf{K}$. Ainsi, pour opérer une régression ridge à noyau avec approximation de Nyström, il suffit de faire une régression ridge sur ces variables transformées.

## Exemple sur un jeu de données synthétique

Nous reprenons le jeu de données synthétique utilisé depuis le premier module pour tester l'implémentation de la régression ridge à noyau gaussien.

```{r}
  set.seed(1123)
  n <- 100
  data = gendat(n,0.2)
  splitres <- splitdata(data,0.8)
  entr <- splitres$entr
  test <- splitres$test

  nakrm <- nakr(entr$X,entr$Y,nb.landmarks=25)
  yh <- predict(nakrm,test$X)
  plt(test,f)
  points(test$X, yh, pch=4)
  testmae <- mean(abs(yh-test$Y))
```

Ce modèle atteint une erreur absolue moyenne de `r testmae` sur le jeu de test.

## Annexe code source

```{r, code=readLines("19_nystroem_approximation_code.R"), eval=FALSE}
```
ajout d'un chapitre sur l'approximation de Nystroem 2022-04-16 05:58:20 -04:00			`# Approximation de Nyström`

			```{r, include=FALSE}
ajout suffixe _code aux fichier R ; restructuration des chapitres 2023-01-02 15:45:54 -05:00			`source("01_intro_code.R", local = knitr::knit_global())`
			`source("04_validation_croisee_code.R", local = knitr::knit_global())`
Nystroem approximation is now correct (and simplified). 2023-03-19 13:48:47 -04:00			`source("15_loocv_code.R", local = knitr::knit_global())`
implementation of nystroem approx 2023-01-16 18:17:52 -05:00			`source("19_nystroem_approximation_code.R", local = knitr::knit_global())`
ajout d'un chapitre sur l'approximation de Nystroem 2022-04-16 05:58:20 -04:00			```

			`## Approximation de Nyström d'un noyau`

			`Soit un jeu de données.`
			`$$`
			`(\mathbf{x_i},y_i) \quad \text{avec} \quad i=1,\dots,n \quad ; \quad y_i \in \mathbb{R} \quad ; \quad \mathbf{x_i} \in \mathbb{R}^d`
			`$$`
			`Soit un noyau $\mathbf{K}$ (par ex. gaussien).`
			`$$`
			`\mathbf{K} =`
			`\left[`
			`\begin{array}{ccc}`
			`k(\mathbf{x_1},\mathbf{x_1}) & \dots & k(\mathbf{x_1},\mathbf{x_n}) \\`
			`\dots & \dots & \dots \\`
			`k(\mathbf{x_n},\mathbf{x_1}) & \dots & k(\mathbf{x_n},\mathbf{x_n}) \\`
			`\end{array}`
			`\right]`
			`$$`

			`Soit une approximation de rang $m$ de $\mathbf{K}$ (exprimée en fonction de la décomposition en valeurs propres).`
			`$$`
			`\mathbf{K} \approx \mathbf{U}\mathbf{\Lambda}\mathbf{U}^T \quad ; \quad \mathbf{U} \in \mathbb{R}^{n \times m} \quad ; \quad \mathbf{\Lambda} \in \mathbb{R}^{m \times m}`
			`$$`

			Lorsque le noyau original est utilisé, chacune des $n$ observations du jeu de données d'apprentissage sert de centre à partir duquel mesurer la proximité d'une nouvelle observation. Le principe de l'approximation de rang $m$, selon l'approche dite de Nyström, est de ne conserver que $m$ des $n$ observations comme centres à partir desquels mesurer la proximité de nouvelles observations. Nous sélectionnons ainsi $m$ observations ($m$ lignes de $\mathbf{X}$). Plusieurs approches sont possibles (par ex., de façon aléatoire, par l'algorithme des $k$-médoïdes, etc.).
			`$$`
			`\mathbf{K} =`
			`\left[`
			`\begin{array}{cc}`
			`\mathbf{K_{11}} & \mathbf{K_{21}^T} \\`
			`\mathbf{K_{21}} & \mathbf{K_{22}} \\`
			`\end{array}`
			`\right]`
			`\quad ; \quad`
			`\mathbf{K_{11}} \in \mathbb{R}^{m \times m}`
			`\quad ; \quad`
			`\mathbf{K_{21}} \in \mathbb{R}^{(n-m) \times m}`
			`$$`

			`L'approximation de Nyström construit une approximation de rang $m$ de $\mathbf{K}$ sans évaluer $\mathbf{K_{22}}$. Commençons par exprimer l'approximation de rang $m$ sous forme de matrices blocs.`
			`$$`
			`\mathbf{K} \approx`
			`\left[`
			`\begin{array}{c}`
			`\mathbf{U_1} \\`
			`\mathbf{U_2} \\`
			`\end{array}`
			`\right]`
			`\mathbf{\Lambda}`
			`\left[`
			`\begin{array}{c}`
			`\mathbf{U_1} \\`
			`\mathbf{U_2} \\`
			`\end{array}`
			`\right]^T`
			`=`
			`\left[`
			`\begin{array}{cc}`
			`\mathbf{U_1}\mathbf{\Lambda}\mathbf{U_1}^T & \mathbf{U_1}\mathbf{\Lambda}\mathbf{U_2}^T \\`
			`\mathbf{U_2}\mathbf{\Lambda}\mathbf{U_1}^T & \mathbf{U_2}\mathbf{\Lambda}\mathbf{U_2}^T \\`
			`\end{array}`
			`\right]`
			`\quad ; \quad`
			`\mathbf{U_1} \in \mathbb{R}^{m \times m}`
			`\quad ; \quad`
			`\mathbf{U_2} \in \mathbb{R}^{(n-m) \times m}`
			`$$`

			`Nous remarquons que $\mathbf{K_{11}} \approx \mathbf{U_1}\mathbf{\Lambda}\mathbf{U_1}^T$ et $\mathbf{K_{21}} \approx \mathbf{U_2}\mathbf{\Lambda}\mathbf{U_1}^T$.`

			`Nous souhaitons éviter d'avoir à évaluer $\mathbf{U_2}$ dans l'approximation de $\mathbf{K_{22}}$. Nous cherchons donc une expression de $\mathbf{U_2}$ en fonction d'autres termes.`
			`$$`
			`\begin{aligned}`
			`& \mathbf{K_{21}} \approx \mathbf{U_2}\mathbf{\Lambda}\mathbf{U_1}^T \\`
			`= \{& \text{Multiplication des deux membres de l'égalité par un même terme non nul.} \} \\`
			`& \mathbf{K_{21}} \left( \mathbf{\Lambda}\mathbf{U_1}^T \right)^{-1} \approx \mathbf{U_2}\mathbf{\Lambda}\mathbf{U_1}^T \left( \mathbf{\Lambda}\mathbf{U_1}^T \right)^{-1} \\`
			`= \{& \text{Algèbre linéaire. $\mathbf{U_1}$ est orthogonale donc $\mathbf{U_1}^{-1}=\mathbf{U_1}^T$} \} \\`
			`& \mathbf{K_{21}} \mathbf{U_1} \mathbf{\Lambda}^{-1} \approx \mathbf{U_2}`
			`\end{aligned}`
			`$$`

			`Nous pouvons maintenant découvrir une approximation de $\mathbf{K_{22}}$ en fonction de celles de $\mathbf{K_{21}}$ et $\mathbf{K_{11}}$.`
			`$$`
			`\begin{aligned}`
			`& \mathbf{K_{22}} \\`
			`\approx \{& \text{Définition de la décomposition en valeurs propres pour l'approximation de rang $m$ de $\mathbf{K}$} \} \\`
			`& \mathbf{U_2}\mathbf{\Lambda}\mathbf{U_2}^T \\`
			`= \{& \text{Expression de $\mathbf{U_2}$ découverte ci-dessus.} \} \\`
			`& \left(\mathbf{K_{21}} \mathbf{U_1} \mathbf{\Lambda}^{-1}\right)\mathbf{\Lambda}\left(\mathbf{K_{21}} \mathbf{U_1} \mathbf{\Lambda}^{-1}\right)^T \\`
			`= \{& \text{Algèbre linéaire.} \} \\`
			`& \mathbf{K_{21}} \mathbf{U_1} \mathbf{\Lambda}^{-1} \mathbf{U_1}^T \mathbf{K_{21}}^T \\`
			`= \{& \text{Définition de $\mathbf{K_{11}}$ dans la décomposition en valeurs propres pour l'approximation de rang $m$ de $\mathbf{K}$} \} \\`
			`& \mathbf{K_{21}} \mathbf{K_{11}}^{-1} \mathbf{K_{21}}^T \\`
			`= \{& \text{Algèbre linéaire.} \} \\`
			`& \left(\mathbf{K_{21}} \mathbf{K_{11}}^{-1/2}\right) \left(\mathbf{K_{21}} \mathbf{K_{11}}^{-1/2}\right)^T`
			`\end{aligned}`
			`$$`

Nystroem approximation is now correct (and simplified). 2023-03-19 13:48:47 -04:00			`Ainsi, nous pouvons introduire une matrice $\boldsymbol\Phi$ de rang $m$ telle que $\mathbf{K} \approx \boldsymbol\Phi \boldsymbol\Phi^T$.`
ajout d'un chapitre sur l'approximation de Nystroem 2022-04-16 05:58:20 -04:00			`$$`
			`\boldsymbol\Phi =`
			`\left[`
			`\begin{array}{c}`
			`\mathbf{K_{11}}^{1/2} \\`
			`\mathbf{K_{21}} \mathbf{K_{11}}^{-1/2} \\`
			`\end{array}`
			`\right]`
			`$$`

			`## Calcul de l'approximation de Nyström`

			`Notons $\mathbf{C}$ la sélection des $m$ colonnes de $\mathbf{K}$.`
			`$$`
			`\mathbf{C} =`
			`\left[`
			`\begin{array}{c}`
			`\mathbf{K_{11}} \\`
			`\mathbf{K_{21}} \\`
			`\end{array}`
			`\right]`
			`$$`

			`Nous avons :`
			`$$`
			`\mathbf{C} \mathbf{K_{11}}^{-1} \mathbf{C}^T =`
			`\left[`
			`\begin{array}{c}`
			`\mathbf{I} \\`
			`\mathbf{K_{21}} \mathbf{K_{11}}^{-1} \\`
			`\end{array}`
			`\right]`
			`\left[`
			`\begin{array}{cc}`
			`\mathbf{K_{11}}^T & \mathbf{K_{21}}^T \\`
			`\end{array}`
			`\right]`
			`=`
			`\left[`
			`\begin{array}{cc}`
			`\mathbf{K_{11}}^T & \mathbf{K_{21}}^T \\`
			`\mathbf{K_{21}} & \mathbf{K_{21}} \mathbf{K_{11}}^{-1} \mathbf{K_{21}}^T \\`
			`\end{array}`
			`\right]`
			`\approx \mathbf{K}`
			`$$`

			`Nous pouvons donc calculer une approximation de rang $m$ de $\mathbf{K} \in \mathbb{R}^{n \times n}$ en ne calculant que $\mathbf{C} \in \mathbb{R}^{n \times m}$ (c'est-à-dire, $m$ colonnes de $\mathbf{K}$) et la décomposition en valeurs propres de $\mathbf{K_{11}} \in \mathbb{R}^{m \times m}$.`
			`$$`
			`\begin{aligned}`
			`& \mathbf{K} \approx \mathbf{C} \mathbf{K_{11}}^{-1} \mathbf{C}^T \\`
			`= \{& \mathbf{K_{11}} = \mathbf{U}\boldsymbol\Sigma\mathbf{U}^T \} \\`
			`& \mathbf{K} \approx \mathbf{C} \mathbf{U}\boldsymbol\Sigma^{-1}\mathbf{U}^T \mathbf{C}^T \\`
			`= \{& \mathbf{L} \triangleq \mathbf{C} \mathbf{U}\boldsymbol\Sigma^{-1/2} \} \\`
			`& \mathbf{K} \approx \mathbf{L}\mathbf{L}^T`
			`\end{aligned}`
			`$$`

Nystroem approximation is now correct (and simplified). 2023-03-19 13:48:47 -04:00			`Nous voyons que la matrice $\mathbf{L}$ correspond exactement à la matrice $\boldsymbol\Phi$ dont les lignes sont les nouvelles variables $m$-dimensionnelles telles que $\boldsymbol\Phi \boldsymbol\Phi^T$ soit une approximation de rang-$m$ de $\mathbf{K}$. Ainsi, pour opérer une régression ridge à noyau avec approximation de Nyström, il suffit de faire une régression ridge sur ces variables transformées.`
more complete explanation of how to apply nystroem approx to ridge reg 2023-02-20 03:40:13 -05:00
implementation of nystroem approx 2023-01-16 18:17:52 -05:00			`## Exemple sur un jeu de données synthétique`

			`Nous reprenons le jeu de données synthétique utilisé depuis le premier module pour tester l'implémentation de la régression ridge à noyau gaussien.`

			```{r}
			`set.seed(1123)`
			`n <- 100`
			`data = gendat(n,0.2)`
			`splitres <- splitdata(data,0.8)`
			`entr <- splitres$entr`
			`test <- splitres$test`

Nystroem approximation is now correct (and simplified). 2023-03-19 13:48:47 -04:00			`nakrm <- nakr(entr$X,entr$Y,nb.landmarks=25)`
implementation of nystroem approx 2023-01-16 18:17:52 -05:00			`yh <- predict(nakrm,test$X)`
			`plt(test,f)`
			`points(test$X, yh, pch=4)`
			`testmae <- mean(abs(yh-test$Y))`
			```

			Ce modèle atteint une erreur absolue moyenne de `r testmae` sur le jeu de test.

			`## Annexe code source`

			```{r, code=readLines("19_nystroem_approximation_code.R"), eval=FALSE}
			```