1. Kernel Principal Component Analysis 的矩阵基础

1.1 传统的PCA如何做？

$$C=\frac{1}{N} x_i x_i^T = \frac{1}{N} X X^T$$

$$CU = U \Lambda \Rightarrow C = U \Lambda U^T = \sum_a \lambda_a u_a u_a^T$$

1.2 在高维空间里的PCA应该如何做？

$$\bar{C}=\frac{1}{N} \Phi(x_i) \Phi(x_i)^T = \frac{1}{N} \Phi(X) \Phi(X)^T$$

1.3 如何用Kernel Trick在高维空间做PCA？

$$C u_a = \lambda_a u_a$$
\begin{align} u_a &= \frac{1}{\lambda_a} C u \\ &= \frac{1}{\lambda_a} (\sum_i x_i x_i^T )u \\ &= \frac{1}{\lambda_a} \sum_i x_i (x_i^T u) \\ &= \frac{1}{\lambda_a} \sum_i (x_i^T u) x_i \\ &= \sum_i \frac{x_i^T u}{\lambda_a} x_i \\ &= \sum_i \alpha_i^a x_i \end{align}

\begin{align} x_i^T C u_a &= \lambda_a x_i^T u_a \\ x_i^T \frac{1}{N} \sum_j x_j x_j^T \sum_k \alpha_k^a x_k &= \lambda_a x_i^T \sum_k \alpha_k^a x_k \\ \sum_j \alpha_k^a \sum_k (x_i^T x_j) (x_j^T x_k) &= N \lambda_a \sum_k \alpha_k^a (x_i^T x_k) \\ \end{align}

（这里$$\alpha^a$$定义为$$[\alpha_1^a, \alpha_2^a, \ldots, \alpha_N^a]^T$$.）

$$K \alpha = \tilde{\lambda}_a \alpha \quad \mathrm{with} \quad \tilde{\lambda}_a = N \lambda_a$$
$$K$$矩阵包含特征值$$\tilde{\lambda}$$和$$\alpha^a$$，我们可以通过$$\alpha$$可以计算得到$$u_a$$，

\begin{align} 1 &= u_a^T u_a \\ &= ( \sum_i \alpha_i^a x_i) ^T (\sum_j \alpha_j^a x_j ) \\ &= \sum_i \sum_j \alpha_i^a \alpha_j^a x_i^T x_j^T \\ &=(\alpha^a)^T K \alpha_a \\ &=(\alpha^a)^T ( N \lambda_a \alpha_a) \\ &=N \lambda_a ({\alpha^a}^T \alpha_a )\\ &\Rightarrow \quad \lVert \alpha^a \rVert = 1/\sqrt{N \lambda_a} = 1/\sqrt{\tilde{\lambda}_a} \end{align}

1.4 如何在主成分方向上投影？

$$u_a^T t = \sum_i \alpha_i^a x_i^T t = \sum_i \alpha_i^a ( x_i^T t)$$

$$u_a^T t = \sum_i \alpha_i^a K(x_i, t)$$

1.5 如何Centering 高维空间的数据？

\begin{align} K_{ij}^C &= <\Phi_i^C \Phi_j^C> \\ &= (\Phi_i – \frac{1}{N}\sum_k \Phi_k)^T (\Phi_j – \frac{1}{N}\sum_l \Phi_l ） \\ &=\Phi_i^T\Phi_j – \frac{1}{N}\sum_l \Phi_i^T \Phi_l – \frac{1}{N}\sum_k \Phi_k^T \Phi_j + \frac{1}{N^2}\sum_k \sum_l \Phi_k^T \Phi_l \\ &=K_{ij} – \frac{1}{N}\sum_l K_{il} – \frac{1}{N}\sum_k K_{kj} + \frac{1}{N^2}\sum_k \sum_l K_{kl} \end{align}

$$K^C = K – 1_N K – K 1_N + 1_N K 1_N$$

\begin{align} K(x_i, t)^C &= <\Phi_i^C \Phi_t^C> \\ &= (\Phi_i – \frac{1}{N}\sum_k \Phi_k)^T (\Phi_t – \frac{1}{N}\sum_l \Phi_l ） \\ &=\Phi_i^T\Phi_t – \frac{1}{N}\sum_l \Phi_i^T \Phi_l – \frac{1}{N}\sum_k \Phi_k^T \Phi_t + \frac{1}{N^2}\sum_k \sum_l \Phi_k^T \Phi_l \\ &=K(x_i,t) – \frac{1}{N}\sum_l K_{il} – \frac{1}{N}\sum_k K(x_k,t) + \frac{1}{N^2}\sum_k \sum_l K_{kl} \end{align}

2. 演示 (R code)

KPCA图片：

R 源代码（Source Code）：链接到完整的代码 KernelPCA

Kernel PCA部分代码：

# Kernel PCA
# Polynomial Kernel
# k(x,y) = t(x) %*% y + 1
k1 = function (x,y) { (x[1] * y[1] + x[2] * y[2] + 1)^2 }
K = matrix(0, ncol = N_total, nrow = N_total)
for (i in 1:N_total) {
for (j in 1:N_total) {
K[i,j] = k1(X[i,], X[j,])
}}
ones = 1/N_total* matrix(1, N_total, N_total)
K_norm = K - ones %*% K - K %*% ones + ones %*% K %*% ones
res = eigen(K_norm)

V = res$vectors D = diag(res$values)

rank = 0
for (i in 1:N_total) {
if (D[i,i] < 1e-6) { break }
V[,i] = V[,i] / sqrt (D[i,i])
rank = rank + 1
}
Y = K_norm %*%  V[,1:rank]
plot(Y[,1], Y[,2], col = rainbow(3)[label], main = "Kernel PCA (Poly)"
, xlab="First component", ylab="Second component")


3. 主要参考资料

【1】A Tutorial on Principal Component Analysis ,Jonathon Shlens, Shlens03

【2】Wikipedia： http://en.wikipedia.org/wiki/Kernel_principal_component_analysis

【3】 Original KPCA Paper：Kernel principal component analysis，Bernhard Schölkopf, Alexander Smola and Klaus-Robert Müller http://www.springerlink.com/content/w0t1756772h41872/fulltext.pdf

【4】Max Wellings’s classes notes for machine learning Kernel Principal Component Analaysis http://www.ics.uci.edu/~welling/classnotes/papers_class/Kernel-PCA.pdf