如何在 Python 中绘制多个线性回归

17 Mar 2025 | 5 分钟阅读

建模因变量（目标变量）与单个自变量（简单回归）或多个自变量（多元回归）之间关系的策略称为线性回归。线性回归算法基于一个假设，即两种类型的变量都存在线性关系。如果存在这种关系，我们可以计算模型所需的系数，以便根据新数据或未见过的数据进行预测。

变量的描述性分析

在深入应用多元线性回归之前，通常最好绘制数据的可视化图表，以便更好地理解数据，并检查每个特征之间是否存在关系。我们将使用 Seaborn 包中的 pairplot() 方法来绘制特征之间的关系图。该函数将生成一个包含每个特征的直方图和散点图的图形。

在本教程中，我们将学习如何使用 NumPy、Pandas、Scikit-Learn 和 Scipy 等各种库，从头开始在 Python 中应用和可视化线性回归过程。

导入库

在第一步中，我们将导入一些所需的 Python 库，如 NumPy、Pandas、sklearn、matplotlib 等。此外，我们将使用 Pandas 库从 GitHub 存储库加载数据集，并将数据集转换为一个名为 df 的数据框。

代码

# Importing the required methods and modules

# Basic libraries 
import pandas as pd
import numpy as np
import warnings

# For building the model
from sklearn import linear_model

# For data visualization
import seaborn as sns
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline

输出

(545, 4)
      price  area  bedrooms  stories
0  13300000  7420         4        3
1  12250000  8960         4        4
2  12250000  9960         3        2
3  12215000  7500         4        2
4  11410000  7420         4        2

特征选择

我们将使用配对图查看特征之间的关系。

代码

# Visualizing the relationships between features using pair plots
sns.pairplot(data = housing, height = 2)

输出

How to Plot Multiple Linear Regression in Python

从图的第一行可以看出，数据集中价格和面积特征之间存在线性关系。我们可以看到，其余变量的散点图是随机的，并且它们之间没有显示出任何关系。我们应该只选择一个具有它们之间关系的多元自变量。虽然这里的价格是目标变量，因此无需删除任何特征。

多重共线性

多元线性回归模型假设回归中所使用的预测变量或自变量之间不存在相关性。使用 Pandas 数据框的 corr() 方法，我们可以计算数据中任意两个特征之间的皮尔逊相关系数，并构建一个矩阵来查看任何预测变量之间是否存在相关性。之后，我们可以使用 Seaborn 的 heatmap() 图将该矩阵显示为热力图。

代码

# Visualizing multicollinearity between independent features using a heatmap

corr = housing[['area', 'bedrooms', 'stories']].corr()
print('Pearson correlation coefficient matrix for each independent variable: \n', corr)

# Masking the diagonal cells 
masking = np.zeros_like(corr, dtype = np.bool)
np.fill_diagonal(masking, val = True)

# Initializing a matplotlib figure
figure, axis = plt.subplots(figsize = (4, 3))

# Generating a custom colormap
c_map = sns.diverging_palette(223, 14, as_cmap = True, sep = 100)
c_map.set_bad('grey')

# Displaying the heatmap with the masking and the correct aspect ratio
sns.heatmap(corr, mask = masking, cmap = c_map, vmin = -1, vmax = 1, center = 1, linewidths = 1)
figure.suptitle('Heatmap visualizing Pearson Correlation Coefficient Matrix', fontsize = 14)
axis.tick_params(axis = 'both', which = 'major', labelsize = 10)

输出

Pearson correlation coefficient matrix for each independent variable: 
               area  bedrooms   stories
area      1.000000  0.151858  0.083996
bedrooms  0.151858  1.000000  0.408564
stories   0.083996  0.408564  1.000000

构建多元线性回归模型

让我们继续构建我们的回归模型。既然我们已经看到了特征之间没有关系和共线性，我们就可以使用所有特征来构建模型。我们将使用 Sklearn 的 linear_model 库中的 LinearRegression() 类来创建我们的模型。

代码

# Building the Multiple Linear Regression Model

# Setting the independent and dependent features
X = housing.iloc[:, 1:].values
y = housing.iloc[:, 0].values


# Initializing the model class from the sklearn package and fitting our data into it
reg = linear_model.LinearRegression()
reg.fit(X, y)

# Printing the intercept and the coefficients of the regression equation
print('Intercept: ', reg.intercept_)
print('Coefficients array: ', reg.coef_)

输出

Intercept:  157155.2578429943
Coefficients array:  [4.17726303e+02 4.18703502e+05 6.73797188e+05]

我们将尝试使用下面的代码单元将我们的模型转换为三维图形。我们的数据点将以灰色圆点的形式显示在图表中，线性模型将以蓝色平面表示。

代码

# Plotting a 3-D plot for visualizing the Multiple Linear Regression Model

# Preparing the data
independent = housing[['area', 'bedrooms']].values.reshape(-1,2)
dependent = housing['price']

# Creating a variable for each dimension
x = independent[:, 0]
y = independent[:, 1]
z = dependent

x_range = np.linspace(5, 10, 35)  
y_range = np.linspace(3, 6, 35) 
x1_range = np.linspace(3, 6, 35)
x_range, y_range, x1_range = np.meshgrid(x_range, y_range, x1_range)
viz = np.array([x_range.flatten(), y_range.flatten(), x1_range.flatten()]).T

# Predicting price values using the linear regression model built above
predictions = reg.predict(viz)

# Evaluating the model using the R2 square of the model
r2 = reg.score(X, y)

# Ploting the model for visualization
plt.style.use('fivethirtyeight')

# Initializing a matplotlib figure
fig = plt.figure(figsize = (15, 6))

axis1 = fig.add_subplot(131, projection = '3d')
axis2 = fig.add_subplot(132, projection = '3d')
axis3 = fig.add_subplot(133, projection = '3d')

axes = [axis1, axis2, axis3]

for ax in axes:
    ax.plot(x, y, z, color='k', zorder = 10, linestyle = 'none', marker = 'o', alpha = 0.1)
    ax.scatter(x_range.flatten(), y_range.flatten(), predictions, facecolor = (0,0,0,0), s = 20, edgecolor = '#70b3f0')
    ax.set_xlabel('Area', fontsize = 10, labelpad = 10)
    ax.set_ylabel('Bedrooms', fontsize = 10, labelpad = 10)
    ax.set_zlabel('Prices', fontsize = 10, labelpad = 10)
    ax.locator_params(nbins = 3, axis = 'x')
    ax.locator_params(nbins = 3, axis = 'x')

axis1.view_init(elev=25, azim=-60)
axis2.view_init(elev=15, azim=15)
axis3.view_init(elev=25, azim=60)

fig.suptitle(f'Multi-Linear Regression Model Visualization (R2 = {r2})', fontsize = 15, color = 'k')

输出

下一个主题Python 中的物理计算：Python 函数介绍

如何在 Python 中绘制多个线性回归

变量的描述性分析

导入库

特征选择

多重共线性

构建多元线性回归模型

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Python 问题

如何在 Python 中绘制多个线性回归

变量的描述性分析

导入库

特征选择

多重共线性

构建多元线性回归模型

相关帖子

如何使用 Asyncio 在 Python 中创建 Telnet 客户端

使用 Python 制作翻牌游戏 (记忆游戏)

在 Python 中查找较低的插入点

Python 程序检查给定数字是否为斐波那契数

Python re 模块的 Split, Sub, Subn 函数

Python 中的类型转换

天气 API Python

Pafy 模块简介

Flask 中的多值查询参数

使用 Python 在 NumPy 中创建自己的通用函数

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器