Python Scikit Learn - Ridge回归

2025年1月5日 | 阅读 4 分钟

Ridge 回归，作为线性回归的一个变体，是数据科学家和机器学习从业者工具箱中的一个重要工具。它解决了线性回归的一些局限性，特别是在处理多重共线性或当特征数量超过观测数量时。在本文中，我们将使用 Python 中最受欢迎的机器学习库之一 Scikit-Learn 来探讨 Ridge 回归。

理解 Ridge 回归

Ridge 回归，也称为 Tikhonov 正则化，在普通最小二乘 (OLS) 的目标函数中添加了一个正则化项。这个项惩罚了系数的幅度，有效地将它们收缩到零附近，但并不会将它们完全置零。Ridge 回归的目标函数如下：

其中，w 是模型系数，Xi 是第 i 个观测值的特征向量，yi 是目标值，λ 是正则化参数。第二项是正则化项，它惩罚了较大的系数。

为什么要使用 Ridge 回归？

多重共线性：当特征高度相关时，OLS 估计具有较大的方差。Ridge 回归通过引入偏差但减小方差来缓解这个问题。
过拟合：在高维空间中，模型很容易过拟合训练数据。Ridge 回归通过对系数的大小施加惩罚来帮助缓解这种情况，从而得到泛化能力更好的更简单的模型。
数值稳定性：正则化项稳定了协方差矩阵的求逆，当矩阵接近奇异时，这在 OLS 中可能是一个问题。

使用 Scikit-Learn 实现 Ridge 回归

Scikit-Learn 通过 Ridge 类提供了一个易于使用的 Ridge 回归实现。让我们来看一个实际的例子。

步骤 1：导入库

首先，我们需要导入必要的库。

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

步骤 2：加载数据

在此示例中，我们将使用 Scikit-Learn 中包含的波士顿房价数据集。

from sklearn.datasets import load_boston

# Load the dataset
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = pd.Series(boston.target)
Step 3: Preprocessing
We split the data into training and testing sets and standardize the features.
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

步骤 4：训练模型

我们实例化 Ridge 类并将其拟合到训练数据。

# Instantiate the Ridge regression model
ridge_reg = Ridge(alpha=1.0)  # alpha is the regularization parameter

# Fit the model
ridge_reg.fit(X_train, y_train)

步骤 5：评估模型

我们使用模型对测试数据进行预测并评估其性能。

# Make predictions
y_pred = ridge_reg.predict(X_test)

# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R^2 Score: {r2}")

输出

Mean Squared Error: 25.41958712682191
R^2 Score: 0.6693702691495616

步骤 6：分析系数

Ridge 回归的一个关键好处是它会收缩系数。我们可以检查系数来查看正则化的效果。

# Inspect the coefficients
coefficients = pd.Series(ridge_reg.coef_, index=boston.feature_names)
print(coefficients)

输出

CRIM       -1.038819
ZN          1.021696
INDUS       0.205204
CHAS        0.780355
NOX        -1.821555
RM          2.918722
AGE        -0.820582
DIS        -3.028661
RAD         2.405121
TAX        -1.499506
PTRATIO    -2.063730
B           0.830963
LSTAT      -3.837109
dtype: float64

步骤 7：调整正则化参数

Ridge 回归的性能取决于正则化参数 α。我们可以使用交叉验证来找到最佳值。

from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid = {'alpha': [0.1, 1.0, 10.0, 100.0, 200.0]}

# Instantiate the grid search
grid_search = GridSearchCV(Ridge(), param_grid, cv=5, scoring='neg_mean_squared_error')

# Fit the grid search
grid_search.fit(X_train, y_train)

# Get the best model
best_ridge_reg = grid_search.best_estimator_
print(f"Best alpha: {best_ridge_reg.alpha}")

输出

Best alpha: 1.0

可视化结果

可视化 Ridge 回归的性能有助于更好地理解其行为。让我们绘制真实值与预测值。

# Plot true vs predicted values
plt.scatter(y_test, y_pred, edgecolor='k', alpha=0.7)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
plt.xlabel('True Values')
plt.ylabel('Predicted Values')
plt.title('True vs Predicted Values')
plt.show()

输出

结论

Ridge 回归是一种强大的技术，它解决了普通最小二乘回归的一些局限性，特别是在存在多重共线性和高维数据的情况下。使用 Scikit-Learn，实现 Ridge 回归非常简单，可以轻松地进行不同正则化参数的实验和模型评估。

通过惩罚较大的系数，Ridge 回归可以产生更稳定、更易于解释的模型，并且能更好地泛化到新数据。与任何机器学习技术一样，仔细调整超参数和验证模型以确保最佳性能至关重要。

下一个主题Python 中的 pythonpath 环境变量是什么

Python Scikit Learn - Ridge回归

理解 Ridge 回归

为什么要使用 Ridge 回归？

使用 Scikit-Learn 实现 Ridge 回归

结论

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

其他

Python Scikit Learn - Ridge回归

理解 Ridge 回归

为什么要使用 Ridge 回归？

使用 Scikit-Learn 实现 Ridge 回归

结论

相关帖子

使用Python的LZMA算法进行压缩 (lzma)

如何下载Python旧版本并安装

Python Requests - response.raise_for_status()

TypeScript和Python的区别

Python中的shutil.copytree()方法

Python中将JSON转换为字典

Python中的OpenCV卡尔曼滤波器

Python中的ops库

Python中的PySide6模块

Python中re.search和re.match的区别？

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器