Python 中的多元线性回归

2025年3月17日 | 阅读 3 分钟

在线性回归建模的上下文中，“多元线性回归”和“复线性回归”通常指代相同的概念。这两个术语都描述了一种线性回归模型，其中您使用多个自变量（特征）来预测单个因变量（目标）。换句话说，这两个术语都意味着一个具有一个以上预测变量的线性回归模型。

线性回归是一种重要的机器学习方法，用于根据一个或多个自变量预测连续的目标变量。当存在多个自变量时，我们称之为多元线性回归。在本文中，我们将深入探讨多元线性回归的领域，并在 Python 中实现它。

理解多元线性回归

多元线性回归将简单线性回归扩展到多个自变量。我们不再仅依靠一个特征（X）来预测目标（Y），而是拥有多个特征（X1、X2、……、Xn）。目标保持不变：找到一组自变量与目标变量之间的最佳线性关系。

多元线性回归的通用公式为：

Y = b₀ + b₁* X₁ + b₂* X₂ +b₃* X₃ + ……………b_n* X_n + ?

这里的 Y 是目标变量，X₁ , X₂ , X₃ , X₄ , …………X_n 是自变量，b₀ 是截距，b₁, b₂, b₃, b₄, ………..b_n 是系数，? 代表误差项。

回归模型假设

同方差性（Homoscedasticity）：误差方差必须恒定。
线性（Linearity）：因变量和自变量之间的关系应该是线性的。
无多重共线性（Lack of Multicollinearity）：假定数据中的多重共线性非常小。
多元正态性（Multivariate normality）：多元回归假设残差呈正态分布。

输入代码

import numpy as np
import matplotlib.pyplot as plt

# Generate the  data
np.random.seed(0)
n_samples = 100
X1 = np.random.rand(n_samples) * 10
X2 = np.random.rand(n_samples) * 5
y = 2 * X1 + 3 * X2 + np.random.rand(n_samples)

# Create a design matrix X by stacking X1 and X2 horizontally
X = np.column_stack((X1, X2))

# Add a column of ones for the intercept term
X = np.column_stack((np.ones(n_samples), X))

# Split the data into training and testing sets
split_ratio = 0.8
split_index = int(n_samples * split_ratio)
X_train, y_train = X[:split_index], y[:split_index]
X_test, y_test = X[split_index:], y[split_index:]

# Calculate the coefficients using the normal equation
coefficients = np.linalg.inv(X_train.T @ X_train) @ X_train.T @ y_train

# Make predictions on the test set
y_pred = X_test @ coefficients

# Calculate Mean Squared Error (MSE)
mse = np.mean((y_test - y_pred) ** 2)

# Calculate R-squared (R^2)
y_mean = np.mean(y_test)
ss_total = np.sum((y_test - y_mean) ** 2)
ss_residual = np.sum((y_test - y_pred) ** 2)
r2 = 1 - (ss_residual / ss_total)

# Print coefficients, MSE, and R-squared
print("Coefficients:", coefficients)
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")

# Visualize the given data and regression plane
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X_test[:, 1], X_test[:, 2], y_test, label='Actual', color='blue')
ax.scatter(X_test[:, 1], X_test[:, 2], y_pred, label='Predicted', color='red', marker='x')
ax.set_xlabel('X1')
ax.set_ylabel('X2')
ax.set_zlabel('y')
ax.legend()
plt.show()

输出

Coefficients: [0.58539281 1.99996142 2.97306189]
Mean Squared Error: 0.09732995265403607
R-squared: 0.9976043742393531

Multivariate Linear Regression in Python

在此代码中：

我们生成具有两个自变量 X1 和 X2 以及一个目标变量 y 的人工数据。
通过将 X1 和 X2 水平堆叠来创建一个设计矩阵 X，并添加一列 1 作为截距项。
我们将数据分为训练集和测试集。
我们使用线性回归的正规方程计算系数。
我们在测试集上进行预测，并计算均方误差（MSE）和 R 方（R²）。
我们使用 **matplotlib** 的 **3-D** 散点图来可视化数据点和回归平面。

此代码使用 numpy 执行多元线性回归，并使用 matplotlib 可视化结果。

理解结果

均方误差（MSE）：较低的 MSE 表明模型与数据的拟合度更好。它衡量预测值与实际值之间的平均平方差。
R 方（R²）：R 方衡量因变量（目标）中可由自变量预测的方差的百分比。值越高表示拟合度越好，1.0 表示完美拟合。

下一个主题Python 统计学中的负二项离散分布

Python 中的多元线性回归

理解多元线性回归

回归模型假设

输入代码

理解结果

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Python 问题

Python 中的多元线性回归

理解多元线性回归

回归模型假设

输入代码

理解结果

相关帖子

Python 的 Pickle 模块

__repr__() vs __str__() 的区别

使用酒店价格比较 API 使用 Python 查找酒店价格

forward driver method - Selenium Python

如何在 Python 字符串中添加字符

使用 Matplotlib 在 Python 中进行 3D 快速排序可视化

Python pyautogui 库

Python for Mechanical Engineers

Python sleep() 函数

Python CSV 模块简介

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器

repr() vs str() 的区别