机器学习中的解析解与数值解

2025 年 6 月 17 日 | 阅读 9 分钟

现代人工智能在很大程度上依赖于解决复杂问题的方法。机器学习模型在解决机器学习问题的方法上被发现有两种解决方案，即解析解和数值解。每种解决方案都有其各自的优势、劣势和应用，因此从业者理解它们之间的差异至关重要。

选择解析解还是数值解，基本上取决于你所处理问题的类型。在非常简单的线性模型中，它因其精确性和效率而被优先选用。然而，在更复杂的模型中，如深度学习和非线性回归，数值解通常是唯一的选择。

方面	解析解	数值解
定义	提供本质上精确的数学表达式。	通过迭代逼近的解。
精度	解是精确且准确的。	解是近似的，精度取决于收敛性。
适用性	仅限于相对简单且定义明确的问题。	适用于所有类型的问题，即使是复杂的模型。
效率	对于简单模型通常更快。它需要恒定的时间。	计算成本高且耗时。
计算复杂性	较低，因为不涉及迭代。	较高，因为它是一个优化和收敛算法。
可扩展性	对于大型数据集的扩展性有限。	可扩展。它可以有效地处理大型数据集。

解析解

它将涉及一个精确的数学表达式或直接求解问题的公式。对于机器学习，解析解仅仅意味着可以获得参数的实际值，从而使这些值最小化在可用参数上定义的某个成本函数。通常，解决方案是使用微积分、线性代数和优化理论来执行的。

在机器学习的背景下，一个非常常见的解析解的例子是线性回归的闭式解。简单的线性回归的最优参数值可以通过任何针对最小二乘法推导出的公式直接计算得出。因此，有一个非常精确的数值解，而无需实际应用迭代的数值过程。

导入库

代码

 
import numpy as np 
import pandas as pd

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

from sklearn.model_selection import train_test_split 
from numpy import linalg 
import matplotlib.pyplot as plt 
from time import process_time    

正如我们所知，梯度下降是一种迭代算法，需要多次运行直到收敛。特别是对于少量特征，我们可以选择一种更快捷的方法，称为“正规方程”。其思想如下。

我们知道，对于任何函数 f(θ)，导数 f'(θ) 等于零的 θ 值是 f(θ) 的最大值或最小值点。对于我们特定的平方误差成本函数，恰好是最小值点。 x = np.arange(-20, 20, 0.25)。

代码

 
y = (x-5)**2

plt.figure(figsize=(5, 5))

plt.xlabel(r'$\theta_1$')
plt.ylabel(r'$\theta_2$')
plt.title(r'j($\theta$)')

plt.xlim(0, 10)
plt.ylim(-1, 10)

plt.plot(x, y, label='j(θ)')

p1 = [4.5, 5.5]
p2 = [0, 0]

plt.plot(p1, p2, label=r'$\frac{\partial j(θ)}{\partial θ}$')

plt.legend()

plt.show()   

输出

Analytical vs Numerical Solutions in Machine Learning

线性回归模型实际值与预测值之间的差异以平方误差成本函数的形式被利用。求解其导数并使其等于零，以找到模型误差最小化的参数值集合。实际上，这会导致找到由输入数据构成的特定矩阵的逆。除非该矩阵存在逆，否则可能会因为某些原因而遇到问题，例如，由于特征冗余或观测值少于当前情况下的特征。

要处理这种情况，您可以减少特征集的维度，或者收集更多数据。或者，您可以应用计算过程，例如伪逆。即使矩阵不可逆，也可以计算伪逆；像 NumPy 这样的库有直接函数可以更有效地进行此计算。这种方法始终允许计算参数，因此该方法对于任何线性回归问题本质上都是稳健的。

代码

 
# The normal equation

def calculate_theta(X,y):
    """
     Calculates the theta vector using the normal equation.
    
    :param X: inputs (feature values) - data frame of floats
    :param y: outputs (actual target values) - Numpy array of floats
    
    :return: new theta - Numpy array of floats
    
    """
    # Calculate transpose of X
    X_transpose = X.transpose()
    
    # Calculate the dot product between X_transpose and X
    temp_0 = np.dot(X_transpose, X)
        
    # Calculate the inverse of temp_0
    try:
        temp_1 = linalg.inv(temp_0)
     
    except:
        print("\033[93mWarning: Non-invertible Matrix! pinv() will be used\033[0m")
        temp_1 = linalg.pinv(temp_0)

    # Calculate the dot product between temp_1 and X_transpose
    temp_2 = np.dot(temp_1, X_transpose)

    # Calculate the dot product between temp_2 and y
    theta = np.dot(temp_2, y) 

    return  theta.reshape(-1)   

线性回归的假设的向量化形式写为 hθ (x)=Xθ。假设的这种向量化形式在成本函数中似乎相当重要。

代码

 
# The hypothesis
def h(x, theta):
    """
     Calculates the predicted values (or predicted targets) for a given set of input and theta vectors.
    
    :param x: inputs (feature values) - data frame of floats 
    :param theta: theta vector (weights) - Numpy array of floats
    
    :return: predicted targets - Numpy array of floats
    
    """
    # The hypothesis is a column vector of m x 1
    return np.dot(x, theta)   

使用线性回归的成本函数的向量化形式来验证模型。成本函数包括找到实际值和预测值之间的差异，并以平方误差的形式衡量预测值偏离的程度。这是检验模型准确性的一项练习。

代码

 
# The cost function

def J(X,y,theta):
    """
     Calculates the total error using the squared error function.
    
    :param X: inputs (feature values) - data frame of floats
    :param y: outputs (actual target values) - Numpy array of floats
    :param theta: theta vector (weights) - Numpy array of floats
    
    :return: total error - float
    
    """
    # Calculate the number of examples
    m = len(X)
    
    # Calculate the constant
    c = 1/(2 * m)
       
    # Calculate the array of errors
    temp_0 = h(X, theta) - y.reshape(-1)

    # Calculate the transpose of an array of errors
    temp_1 = temp_0.transpose()

    # Calculate the dot product 
    temp_2 = np.dot(temp_1, temp_0) 

    return  c * temp_2   

在本节中，我们将加载我们的CSV 文件。因为我们有两个数据集版本，所以我们加载较大的那个。然后我们将创建我们的训练 DataFrame 和目标向量。我们将把数据分成训练 DataFrame 和验证 DataFrame，以便验证我们的结果。

代码

 
# Get the data. Note that there are two versions. We will use the one
# with the most rows.

train_data = pd.read_csv("/kaggle/input/graduate-admissions/Admission_Predict_Ver1.1.csv")

# Set X and y
X = train_data.drop(['Chance of Admit ', 'Serial No.'], axis=1) # Chance of Admit is the target variable, and Serial No. Is the order. So we drop them.
y = pd.DataFrame(data = train_data['Chance of Admit ']).to_numpy()

# Instead of finding probabilities, we want to calculate the percentages.
y = y * 100

# Break off validation set from training data
X_train, X_valid, y_train, y_valid = train_test_split(X, y, train_size=0.8, test_size=0.2, random_state = 0)

X_train.head()   

输出

现在在向量化解决方案中，我们需要定义一个 X0 特征，该特征对于所有训练样本都等于 1。因此，我们的第一个任务是为训练和验证数据都添加 X0 列。之后，我们将为训练数据调用 calculate_theta。这根本不需要任何迭代。

代码

 
# Initialize

# Calculate elapsed CPU time
start = process_time()

# Calculate the number of examples
m_train = len(X_train)
m_valid = len(X_valid)

# Calculate the number of features
# including X_0
n = len(X_train.axes[1]) + 1

# Create a list of ones
ones_train = [1] * m_train
ones_valid = [1] * m_valid

# Insert ones to the first column since
# X_0 for all training examples should
# be one.
X_train.insert(0, "X_0", ones_train, True)
X_valid.insert(0, "X_0", ones_valid, True)

# Find the theta vector using the normal equation
theta_train = calculate_theta(X_train,y_train)

# Calculate elapsed CPU time
end = process_time()
execution_time = (end - start)*1000

# display theta and cpu execution time of training
print("\nExecution time: {} milliseconds".format(execution_time))
print("\nCalculated\033[1m θ\033[0m: {}\n".format(theta_train))

# Calculate and display the cost value on the training dataset
cost_train = J(X_train, y_train, theta_train)
print("The training cost is: {}".format(cost_train))   

输出

训练和验证数据集。现在我们将使用验证数据集来验证我们的模型。

代码

 
print("\nCalculated\033[1m θ\033[0m: {}\n".format(theta_train))

cost_valid = J(X_valid, y_valid, theta_train)
print("The validation cost is: {}".format(cost_valid))   

输出

验证数据集的成本似乎比训练数据集略高（更差），这是可以预期的。我们还可以看到下面的验证结果。

代码

 
# Compare actual results with predicted results
result = pd.DataFrame(index=X_valid.index)
result['Actual CoA'] = y_valid
result['Predicted CoA'] = h(X_valid, theta_train)
result.head()   

输出

数值解

数值解使用迭代方法和计算技术来逼近解。当模型或问题过于复杂而无法获得解析解时，它们会被使用。在机器学习中，数值方法通常用于通过像梯度下降这样的技术来调整模型参数来优化成本函数。

线性回归的梯度下降

梯度下降是最小化函数的计算机算法家族的一个通用名称。这是通过从一组初始参数值开始，并迭代地朝着一组最小化某些成本函数或度量的参数值移动来实现的——这是下降部分。通过对涉及的变量求导，向梯度最低（微积分定义）的方向移动来实现最佳拟合——这是梯度部分。

CS 中的另一个重要思想是梯度下降，它可以被视为 CS 在机器学习的显著性上取代统计学的一种动机：它是一种通用工具，可以在许多环境中“蛮力”解决问题，而不会像闭式解那样优雅，也无法像统计学解那样处理数学上的不便。普通的线性回归是展示梯度下降如何工作的良好简单示例。从某个误差函数开始。它同样适用于许多度量，但对于 OLS 来说，它是最明显的：残差平方和。

代码

 
import numpy as np

class GradientDescentLinearRegression:
    def __init__(self, learning_rate=0.01, iterations=1000):
        self.learning_rate, self.iterations = learning_rate, iterations
    
    def fit(self, X, y):
        b = 0
        m = 5
        n = X.shape[0]
        for _ in range(self.iterations):
            b_gradient = -2 * np.sum(y - m*X + b) / n
            m_gradient = -2 * np.sum(X*(y - (m*X + b))) / n
            b = b + (self.learning_rate * b_gradient)
            m = m - (self.learning_rate * m_gradient)
        self.m, self.b = m, b
        
    def predict(self, X):
        return self.m*X + self.b   

好了，这就够了。让我们在一些样本数据上看看它的表现。我们将生成一个围绕直线 y = x 呈正态分布的点云，然后看看我们的算法能得到什么。

代码

 
np.random.seed(42)
X = np.array(sorted(list(range(5))*20)) + np.random.normal(size=100, scale=0.5)
y = np.array(sorted(list(range(5))*20)) + np.random.normal(size=100, scale=0.25)

clf = GradientDescentLinearRegression()
clf.fit(X, y)

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

plt.scatter(X, y, color='black')
plt.plot(X, clf.predict(X))
plt.gca().set_title("Gradient Descent Linear Regression")   

输出

我们的模型解非常接近理想解 m=1 和 b=0。

代码

输出

 
-0.067377115297355974

代码

输出

 
0.99492950681643999

梯度下降的最大优势在于我们不需要了解模型底层方法的任何细节。我们可以使用我们构建的分类器，而无需了解线性回归的基本知识。我们既不需要知道线性回归允许闭式解，也不知道它长什么样，也不知道如何推导出它。我们只需要选择一个度量，计算它的导数，然后让计算机用蛮力来解决它。

这对于简单的普通最小二乘法来说有点浪费；事实上，我们已经有了一种解决它的方法，这是显而易见的。然而，当我们开始生成任意度量或类似的东西时，它就成了一个福音。只要度量满足两个属性：它是可微的（实际上大多数东西都是），并且是凹的，梯度下降解就可以应用于任何模型度量。凹函数的特性是，无论你处于度量曲面的哪个点，导数都会指向一个“拟合”度更高的点，直到你到达底部。凹函数包括漏斗、眼部接触，以及碰巧存在的线性回归参数空间。

下一主题什么是 Epoch

机器学习中的解析解与数值解

解析解

导入库

数值解

线性回归的梯度下降

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

机器学习

监督式学习

分类

杂项

相关教程

面试题

机器学习中的解析解与数值解

解析解

导入库

数值解

线性回归的梯度下降

相关帖子

LDA 在机器学习中的应用

机器学习中的 L1 和 L2 正则化方法

非参数统计简介

机器学习中的信用卡审批

EM 算法在机器学习中的应用

机器学习中的地震预测

机器学习中的 BLEU 分数

机器学习中的模式识别

机器学习书籍

F2 分数

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器