下一篇 → ← 上一篇

Sklearn 逻辑回归

17 Mar 2025 | 6 分钟阅读

在本教程中，我们将学习逻辑回归模型，这是一种用作分类器的线性模型，用于对因变量进行分类。我们将使用 sklearn 的逻辑回归类在数据集上实现此模型。

什么是逻辑回归？

预测分析和分类经常使用这种机器学习回归模型，也称为 logit 模型。根据给定的自变量数据集，逻辑回归模型计算事件发生的概率，例如是否投票。鉴于结果是事件发生的概率，因变量的范围是 0 到 1。

在逻辑回归模型中，使用 logit 公式转换胜率（事件成功的概率除以失败的概率）。以下公式用于表示这个逻辑函数，有时也称为 log odds 或 odds 的自然对数。

Sklearn Logistic Regression

在逻辑回归模型的方程中，Logit(pi) 是因变量或目标变量，而 x 是自变量。估算此线性模型系数的最常见方法是使用最大似然估计 (MLE)。此方法通过迭代评估系数的各种值来找到 log odds 的最佳拟合。

在这些迭代之后，创建对数似然函数，逻辑回归的目标是最大化此函数以获得最准确的参数估计。一旦确定了最佳系数（如果存在多个自变量，则为系数），就可以计算每个观测类别的条件概率，对其取对数并相加，以产生预测概率。

如果分类是二元的，则概率小于 0.5 预测为 0，概率大于 0 预测为 1。一旦计算出逻辑回归模型，就建议评估线性模型的拟合优度或它对因变量类别的预测能力。Hosmer-Lemeshow 检验是评估模型拟合的常用技术。

Sklearn 逻辑回归示例

Sklearn 逻辑回归

class sklearn.linear_model.LogisticRegression(penalty = 'l2', *, dual = False, tol = 0.0001, C = 1.0, fit_intercept = True, intercept_scaling = 1, class_weight = None, random_state = None, solver = 'lbfgs', max_iter = 100, multi_class = 'auto', verbose = 0, warm_start = False, n_jobs = None, l1_ratio = None)

参数

penalty{'l1', 'l2', 'elasticnet', 'none'}, default='l2': 此参数将定义惩罚的规则
"none": 不施加惩罚；
"l2": 如果指定 L2 惩罚项，这是默认选项。
使用 "l1" 命令指定 L1 惩罚项。
使用 'elasticnet' 命令指定 L1 和 L2 惩罚项。
dual bool, default=False: 此参数定义了公式的类型，对偶或原始。
tol float, default=1e-4: 此参数指定停止迭代的容差值。
C float, default=1.0: 它是正则化强度的倒数，必须是一个正浮点数。
fit_intercept bool, default=True: 此参数指定是否应将偏差或截距常数包含在决策函数中。
intercept_scaling float, default=1: 仅当 self.fit_intercept 定义为 True 且使用 solver 'liblinear' 时才有用。
class_weight dict or 'balanced', default=None: 此参数以 {"class label: weight"} 的格式为类别关联权重。如果未提供权重，则所有类别的权重都应为一。
random_state int, RandomState instance, default=None: 如果 solver 是 ["sag," "saga," 或 "liblinear"]，则此参数用于打乱输入数据。
solver{'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'}, default='lbfgs ': 用于优化问题的算法。默认值为 'lbfgs'。

Sklearn 逻辑回归分类器

代码

# Python program to implement sklearn logistic regression model on load_iris dataset of sklearn

# Importing the required libraries
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

# Loading the dataset
X, Y = load_iris(return_X_y = True)

# Creating an instance of the class Logistic Regression model
logreg = LogisticRegression(random_state = 0)

# Fitting the dataset to the logistic regression model
logreg.fit(X, Y)

# Predicting the values
Y_pred = logreg.predict(X[:2, :])
print(Y_pred)
Y_predict = logreg.predict_proba(X[:2, :])
print(Y_predict)

# Calculating the accuracy score of the model
score = logreg.score(X, Y)
print(score)

输出

[0 0]
[[9.81764058e-01 1.82359281e-02 1.43020498e-08]
 [9.71660947e-01 2.83390229e-02 2.99214023e-08]]
0.9733333333333334

逻辑回归 CV 示例

代码

# Python program to implement sklearn logistic regression CV model on load_iris dataset of sklearn

# Importing the required libraries
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegressionCV

# Loading the dataset
X, Y = load_iris(return_X_y = True)

# Creating an instance of the class Logistic Regression CV
logreg = LogisticRegressionCV(cv = 4, random_state = 0)

# Fitting the dataset to the logistic regression CV model
logreg.fit(X, Y)

# Predicting the values
Y_pred = logreg.predict(X[:2, :])
print(Y_pred)
Y_predict = logreg.predict_proba(X[:2, :])
print(Y_predict)

# Calculating the accuracy score of the model
score = logreg.score(X, Y)
print(score)

输出

[0 0]
[[9.91624054e-01 8.37594552e-03 2.92559111e-11]
 [9.85295789e-01 1.47042107e-02 1.03510087e-10]]
0.9866666666666667

Scikit-learn 逻辑回归系数

在本节中，我们将学习如何使用 sklearn 逻辑回归系数。

一个数，我们将自变量乘以该值，称为该特征的系数。在这里，特征的大小和方向由逻辑回归表示。

代码

# Python code to see how to perform Logistic Regression using sklearn.linear_model

# Importing the required modules and classes
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Loading our dataset
data = load_iris()

# Splitting the independent and dependent variables
X = data.data
Y = data.target
print("The size of the complete dataset is: ", len(X))

# Creating an instance of the LogisticRegression class for implementing logistic regression 
log_reg = LogisticRegression()

# Segregating the training and testing dataset
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state = 10)

# Performing the logistic regression on train dataset
log_reg.fit(X_train, Y_train)

# Printing the coefficients
print(log_reg.coef_)

输出

The size of the complete dataset is:  150
[[-0.35041623  0.91723236 -2.23583834 -0.97778255]
 [ 0.56061567 -0.44283218 -0.21739708 -0.64651405]
 [-0.21019944 -0.47440019  2.45323542  1.6242966 ]]

Sklearn 逻辑回归特征重要性

在本节中，我们将研究 sklearn 逻辑回归的特征重要性。

"特征重要性" 是一种方法，它为每个自变量分配一个权重，并根据该值得出信息在预测目标变量方面的价值。

代码

# Python program to learn feature importance for logistic regression

# Importing the required libraries
from sklearn.datasets import make_classification 
from sklearn.linear_model import LogisticRegression 
import matplotlib.pyplot as plt

# Creating dependent and independent features using make_classification of sklearn
X, y = make_classification(n_samples = 1500, n_features = 5, n_informative = 3, n_redundant = 2, random_state = 10) 

# Creating an instance of the model
logreg = LogisticRegression()

# Fitting our data to train the model
logreg.fit(X, y)

weights = logreg.coef_[0]
print(weights)

# Plotting feature importance graph for each feature
for ind, coeff in enumerate(weights): 
    print(f"Feature: {ind}, weight: {coeff}") 
    
plt.bar([n for n in range(len(weights))], weights) 
plt.show()

输出

[ 1.96365376 -0.11875128 -0.32930302  1.23664458 -1.40461804]
Feature: 0, weight: 1.9636537611525497
Feature: 1, weight: -0.1187512810730595
Feature: 2, weight: -0.32930302369908127
Feature: 3, weight: 1.236644582783369
Feature: 4, weight: -1.4046180417231233

Sklearn Logistic Regression

Sklearn 逻辑回归交叉验证

代码

# Python program to test the accuracy of the Logistic Regression model using a cross-validation test

# Improting the required libraries
from numpy import mean, std
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, KFold

# Creating dependent and independent features using make_classification of sklearn
X, y = make_classification(n_samples = 1500, n_features = 25, n_informative = 20, n_redundant = 5, random_state = 10) 

# Creating an instance of the model
logreg = LogisticRegression()

# Fitting our data to train the model
logreg.fit(X, y)

# Using KFold cross-validation to validate the dataset
cross_validation = KFold(n_splits = 10, random_state = 1, shuffle = True) 

# Calculation score using cross_val_score
score = cross_val_score(logreg, X, y, scoring = 'accuracy', cv = cross_validation, n_jobs = -1) 
print("Cross-validation accuracy scores of each split is: ", score)
print("mean and standard deviation of the scores is: ", mean(score), std(score))

输出

Cross-validation accuracy scores of each split is: [0.80666667 0.80666667 0.81333333 0.86666667 0.78666667 0.8
 0.78       0.82       0.80666667 0.83333333]
mean and standard deviation of the scores is:  0.812 0.023247461032216934

下一主题什么是 Python 中的 Sklearn

← 上一篇下一篇 →

相关帖子

我们提供所有技术（如 Java 教程、Android、Java 框架）的教程和面试问题

联系信息

G-13, 2nd Floor, Sec-3, Noida, UP, 201301, India

hr@tpointtech.com

关注我们

教程

Java 数据结构 C 语言 C++ 教程 C# 教程 PHP 教程 HTML 教程 JavaScript 教程 jQuery 教程 Spring 教程

面试题

Microsoft Amazon Adobe Intuit Accenture Cognizant Capgemini Wipro Tcs Infosys

在线编译器

C R C++ Php Java Html Swift Python JavaScript TypeScript

最新帖子 | 教程列表 | 隐私政策 | 关于我们 | 联系我们

© 版权所有 2011 - 2025 TpointTech.com。保留所有权利。