机器学习中的奇异值分解

2025 年 6 月 19 日 | 阅读 9 分钟

SVD（奇异值分解）是一种有效的线性代数数学工具，在许多机器学习和数据科学应用中具有重要意义。它能够高效地处理大型矩阵，去除数据中的噪声，并进一步降低数据的维度。因此，SVD的应用在涉及机器学习的自然语言处理、图像压缩和推荐系统等领域得到了显著的体现。

SVD的目标是在Frobenius范数距离下，找到一类最能逼近给定矩阵的矩阵。事实上，已经证明了最佳逼近矩阵可以表示为三个不同的正交矩阵和一个对角矩阵的乘积。在这种情况下，矩阵乘法根据旋转和缩放操作转换数据点，以在转换后的空间中生成新的数据点。重要的是，对角线上的元素是原始矩阵的特征值，通过截断SVD，新空间中的信息将完全或至少部分是原始数据。SVD返回一个与原始矩阵近似但降低了秩和大小的矩阵，从而节省了计算资源，同时保留了原始矩阵的高预测精度，只要特征值沿着主对角线按降序排列。

最基本地说，SVD是一种矩阵分解算法，它将一个m x n大小的矩阵A分解为三个其他矩阵：

A = UΣVT

其中符号定义如下：

U是一个m x m的 orthogonal（正交）矩阵，是左奇异向量的矩阵。
Σ是一个m x n的对角矩阵，其对角线上的元素是非负的，称为奇异值。
VT是n x n矩阵的转置，包含右奇异向量。

Σ中的每个奇异值是应用于U和V中相应奇异向量的缩放因子。矩阵A的SVD分解产生了低秩逼近；也就是说，数据被简化以实现更快的计算和减少冗余。

SVD的数学直觉

SVD基于特征值和特征向量，这是大多数人用来捕捉高维数据本质的概念。对于任何原始矩阵A，SVD通过将数据转换为由奇异值缩放的正交方向来确定数据中的模式。这些奇异值中最大的一个指示了定义数据的最重要特征，同时确保SVD选择最重要的特征，而不是噪声。

SVD在机器学习中的应用

SVD在机器学习中有多种应用方式。

降维：对于非常大的数据集，SVD可以通过只保留前k个奇异值及其对应的向量来降低特征的维度。这种降维，通常称为截断SVD，可以提高模型性能，因为可以在不显著损失精度的情况下降低模型的计算复杂度。
潜在语义分析：SVD在自然语言处理中扮演着关键角色，用于在潜在语义分析中识别词语和文档之间的关系。它捕获了大型术语-文档矩阵隐藏的语义结构，从而实现了有效的文档聚类和主题建模等任务。
图像压缩：SVD用于图像压缩，其中高分辨率图像通过低秩逼近来近似。由于只保留了最重要的奇异值，SVD可以有效地压缩图像，减少存储需求，同时不损失图像的整体结构。
推荐系统：SVD在协同过滤方面表现良好。在这里，推荐系统对用户偏好进行预测。SVD分解了用户-物品交互矩阵，捕捉了用户和物品之间的潜在关系，从而使系统能够在Netflix和Amazon等平台上做出个性化推荐。
降噪：SVD通过去除最小奇异值来实现数据降噪。通过较大的分量重构矩阵倾向于过滤掉细节，从而为解释提供更清晰的数据。

现在，我们将通过压缩和分析图像来演示SVD技术。

导入库

 
import numpy as np
import matplotlib.pyplot as plt
from numpy import linalg
import sys
import matplotlib.animation as animation
from IPython.display import HTML

import cv2   

提出的SVD实现首先使用NumPy的linalg.eig函数计算矩阵乘积A^T x A的特征值和特征向量。它返回V中的右奇异向量和奇异值。最后，它按降序对奇异值及其在V中的对应奇异向量进行排序，以便最大的分量首先出现。此外，保留非零奇异值以避免复杂性。最后，通过求解涉及A、V和奇异值对角矩阵Sigma的线性方程来获得U中的左奇异向量，并获得完整、高效的SVD表示。

 
def svd(A, tol=1e-5):
    # Singular values and right singular vectors are derived from the eigenvalues and eigenvectors of the matrix ( A^T A ).
    eigs, V = linalg.eig(A.T.dot(A))

    # Singular values are obtained by taking the square root of the eigenvalues.
    vals_sing = np.sqrt(eigs)

    # Both singular values and their corresponding right singular vectors are sorted.
    idx = np.argsort(vals_sing)

    vals_sing = vals_sing[idx[::-1]]
    V = V[:, idx[::-1]]

    # Zero singular values below the specified tolerance level are removed.
    vals_sing_trunc = vals_sing[vals_sing>tol]
    V = V[:, vals_sing>tol]

    # It is not required to store the entire sigma matrix; therefore, only the diagonal elements are returned.
    sigma = vals_sing_trunc

    # The next step involves calculating the U matrix.
    U = A @ V /vals_sing_trunc
    
    return U.real, sigma.real, V.T.real   

Σ中的奇异值在数量级上差异很大，反映了每个奇异值在数据表示中的相对重要性。为了降低维度，我们只保留前k个最大的奇异值。换句话说，我们可以截断Σ，并随后对U和V进行适当的调整。这种截断保留了原始矩阵的形状，并且不违反可行性。当SVD用于降维时，只保留U和Σ是可以的，因为V用于将降秩矩阵U Σ投影回原始维度空间。

 
def truncate(U, S, V, k):
    trunc_U = U[:, :k]
    trunc_S = S[:k]
    trunc_V = V[:k, :]
    return trunc_U, trunc_S, trunc_V   

降维

现在，我们将演示如何对scikit-learn库中提供的著名Iris数据集执行SVD分解以进行降维。

 
from sklearn.datasets import load_iris
import seaborn as sns
import pandas as pd

iris = load_iris()
iris.keys()   

输出

Singular Value Decomposition in Machine Learning

 
data = pd.DataFrame(iris.data)
name_features = iris["name_features"]
data.columns = name_features
data["labels"] = iris.target   

这是一个自定义版本的 `sns.pairplot`，它使用小提琴图代替直方图，并绘制散点图，但根据类别标签进行着色。

 
def custom_pairplot(data, name_features, labels):
    plt.figure(figsize=(10, 10))
    plt.subplots_adjust(left = 0, right=1.5, bottom=0, top=1.5)
    n_features = len(name_features)
    
    for i in range(len(name_features)):
        for j in range(len(name_features)):
            plt.subplot(n_features, n_features, i*n_features+j+1)
            if i==j:
                sns.violinplot(data=data, x=labels, y=name_features[i])
            else:
                plt.scatter(data[name_features[i]], data[name_features[j]], c=data[labels])
                plt.xlabel(name_features[i])
                plt.ylabel(name_features[j])

custom_pairplot(data, name_features=name_features, labels="labels")   

输出

让我们应用SVD分解将数据降至两个维度（k=2）。

 
k = 2

A = data[name_features].values

U, S, Vt = svd(A)
trunc_U, trunc_S, trunc_Vt = truncate(U, S, Vt, k)

trunc_A = trunc_U @ np.diag(trunc_S)
data_reduced = pd.DataFrame(trunc_A)
plt.figure(figsize=(5, 5))
plt.barh(name_features[::-1], S[::-1])
plt.title(f"Singular values, (first {k} are kept)")
plt.gca().xaxis.grid(True)   

输出

正如我们所见，只保留两个维度仍然可以很好地分离类别。虽然在这种情况下使用所有四个特征可能没有问题，但在其他可能阻碍分析的高维数据的情况下，降维变得几乎是必不可少的。

 
plt.figure(figsize=(5, 5))
plt.scatter(data_reduced[0], data_reduced[1], c = iris.target)
plt.xlabel("First feature")
plt.ylabel("Second feature");   

输出

SVD图像压缩

现在，让我们看看SVD如何用于图像压缩。

 
def im2double(im):
    info = no.info(im.dtype)
    return im.astype(np.float)/info.max

# The grayscale image is stored with three channels, but we will only use the first channel for processing.
img = plt.imread("../input/grayscale_image.jpg")[:,:,0]   

除了我自己的SVD实现外，还包含了numpy.linalg SVD算法以供比较。

 
channel_gray = im2double(img)

# doing the  implementation
U, S, V = svd(channel_gray)

# Implementation of the linalg library 
U_, S_, V_ = np.linalg.svd(channel_gray)

# total number of singular values kept
k = 20

fig = plt.figure(figsize=(15,15))

ax1 = plt.subplot(1, 3, 1)
ax2 = plt.subplot(1, 3, 2)
ax3 = plt.subplot(1, 3, 3)

plt.ion()

fig.canvas.draw()

trunc_U, trunc_S, trunc_Vt = truncate(U, S, V, k)
_trunc_U, _trunc_S, _trunc_Vt = truncate(U_, S_, V_, k)

my_channel = 255 * trunc_U @ np.diag(trunc_S) @ trunc_Vt
linalg_channel = 255 * _trunc_U @ np.diag(_trunc_S) @ _trunc_Vt

ax1.title.set_text(f"Original image")
ax1.imshow(channel_gray, cmap='gray')
    
ax2.title.set_text(f"Custom svd implementation, k={k}")
ax2.imshow(my_channel, cmap='gray')


ax3.title.set_text(f"Numpy linalg svd implementation, k={k}")
ax3.imshow(linalg_channel, cmap='gray')   

输出

这个动画展示了随着我们调整k值，图像如何演变。

 
plt.rcParams['animation.embed_limit'] = 2**128
fps = 30
step = 5

fig, (ax1, ax2) = plt.subplots(1, 2)

ax2.axis("off")

#setting the  figure dimension
fig.set_size_inches(10, 5)

#setting the  first frame for the first plot (single values)
ax1.set_yscale("log")
ax1.plot(S)
ax1.grid()


#setting the  first frame for the second plot (image)
im = ax2.imshow(channel_gray, interpolation='none', vmin=0, vmax=1, cmap='gray');
plt.tight_layout()

def animate_func(i):
    #setting the very next frame for single-value truncation
    k = len(S)-i*step
    ax1.clear()
    ax1.set_yscale("log")
    ax1.axvline(x=k, ymin=0, ymax=1, c="gray", linestyle="--")
    ax1.plot(S)
    ax1.legend([f"Truncation k={k}", "Log10 of singular value"])
    
    ax1.grid()
        
    # We will truncate the SVD decomposition and update the frame accordingly.
    trunc_U, trunc_S, trunc_Vt = truncate(U, S, V, k)
    new_channel = trunc_U @ np.diag(trunc_S) @ trunc_Vt
    im.set_array(new_channel)
    return [fig]

ani= animation.FuncAnimation(
                               fig, 
                               animate_func, 
                               frames = len(S)//step+1,
                               interval = 100 / fps, # in ms
                               );   

输出

这项技术也可以通过分别对红、绿、蓝三个颜色通道应用SVD来扩展到RGB图像。

 
# This involves extracting the individual channels of the image.
rgb_img = plt.imread("../input/rgb_image.jpg")

#This entails dividing the channels into distinct components.
red_channel = im2double(rgb_img[:, :, 0])
green_channel = im2double(rgb_img[:, :, 1])
blue_channel = im2double(rgb_img[:, :, 2])

# obtaining the singular value decomposition (SVD) factorization of the given matrix.
r_U, r_S, r_V = svd(red_channel)
g_U, g_S, g_V = svd(green_channel)
b_U, b_S, b_V = svd(blue_channel)   

 
fig, (ax1, ax2) = plt.subplots(1, 2)

ax2.axis("off")

# the process of defining the dimensions of the figure for visualization purposes.
fig.set_size_inches(10, 5)

#Now we establish the initial visualization for the first plot, which focuses on displaying the singular values.
ax1.set_yscale("log")
ax1.plot(S)
ax1.grid()

# configuring the initial visualization for the second plot, which is dedicated to showcasing the image.
im = ax2.imshow(cv2.merge((red_channel, green_channel, blue_channel)).clip(0.0, 1.0),
                interpolation='none', vmin=0, vmax=1)

plt.tight_layout()

def animate_func(i):
    # updating the subsequent visualization frame to reflect the truncation of singular values.
    k = len(S)-i*step
    ax1.clear()
    ax1.set_yscale("log")
    ax1.axvline(x=k, ymin=0, ymax=1, linestyle='--', c="gray")
    
    ax1.plot(r_S, c="red", linewidth=1)
    ax1.plot(g_S, c="green", linewidth=1)
    ax1.plot(b_S, c="blue", linewidth=1)

    ax1.legend([f"Truncation k={k}", "Log10 of RED singular value",
                "Log10 of GREEN singular value", "Log10 of BLUE singular value"])
    ax1.grid()
    
    # entails reducing the SVD decomposition by truncating the singular values and then updating the visualization to represent this new configuration.

    r_trunc_U, r_trunc_S, r_trunc_Vt = truncate(r_U, r_S, r_V, k)
    g_trunc_U, g_trunc_S, g_trunc_Vt = truncate(g_U, g_S, g_V, k)
    b_trunc_U, b_trunc_S, b_trunc_Vt = truncate(b_U, b_S, b_V, k)

    r_new_channel = r_trunc_U @ np.diag(r_trunc_S) @ r_trunc_Vt
    g_new_channel = g_trunc_U @ np.diag(g_trunc_S) @ g_trunc_Vt
    b_new_channel = b_trunc_U @ np.diag(b_trunc_S) @ b_trunc_Vt

    im.set_array(cv2.merge((r_new_channel, g_new_channel, b_new_channel)).clip(0.0, 1.0))
    return [fig]

ani= animation.FuncAnimation(
                               fig, 
                               animate_func, 
                               frames = len(S)//step+1,
                               interval = 100 / fps, # in ms
                               )   

输出

生成的压缩图像可以通过存储三个截断的矩阵来表示，从而使我们能够将压缩率定义为原始未压缩大小与压缩大小的比率。此外，我们可以计算解释方差的百分比，其中每个奇异值表示它所解释的方差。一个常见的指导方针是保留足够的奇异值来解释至少85%的总方差。

 
size_original = np.prod(rgb_img.shape) #evaluated in float numbers (dimension on disk is different)

size_compressed = []

dim_start = np.linalg.matrix_rank(red_channel)

for k in range(1, dim_start+1):
    # The assessment is performed on the red channel, and the result is subsequently multiplied by three to account for the three color channels in the image.
    r_trunc_U, r_trunc_S, r_trunc_Vt = truncate(r_U, r_S, r_V, k)
    size_compressed.append((np.prod(r_trunc_U.shape)+k+np.prod(r_trunc_Vt.shape)) * 3)

var_total = sum(r_S**2)+sum(g_S**2)+sum(b_S**2)

padded_r_S = np.pad(r_S, (0, dim_start-len(r_S)))
padded_g_S = np.pad(g_S, (0, dim_start-len(g_S)))
padded_b_S = np.pad(b_S, (0, dim_start-len(b_S)))


explained = np.cumsum(padded_r_S**2)+np.cumsum(padded_g_S**2)+np.cumsum(padded_b_S**2)
explained /= var_total   

 
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(size_original/size_compressed)
plt.yscale("log")
plt.grid()
plt.xlabel("Number of singular values")
plt.ylabel("Compression ratio")

plt.subplot(1, 2, 2)
plt.plot(explained)
plt.xscale("log")
plt.grid()
plt.xlabel("Number of singular values")
plt.ylabel("Explained variability");   

输出

下一个主题机器学习数学课程

机器学习中的奇异值分解

SVD的数学直觉

SVD在机器学习中的应用

导入库

降维

SVD图像压缩

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

机器学习

监督式学习

分类

杂项

相关教程

面试题

机器学习中的奇异值分解

SVD的数学直觉

SVD在机器学习中的应用

导入库

降维

SVD图像压缩

相关帖子

机器学习中的解析解与数值解

使用 ColumnTransformer 和 OneHotEncoder 进行预测

Transformer 注意力机制

机器学习类型

机器学习中的成本函数

不平衡分类的欠采样算法

什么是雅可比矩阵？

K-Means 中的文本聚类

梯度提升算法

OneVsRestClassifier

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器