Python 中的 fit(), transform() 和 fit_transform() 方法

2024 年 8 月 29 日 | 阅读 6 分钟

可以说，scikit-learn（有时称为 sklearn）是 Python 中最具影响力和最受欢迎的机器学习包之一。它包含一套完整的算法和模型技术，可供训练，以及用于预处理、训练和评估模型的实用工具。

Sklearn 中最基本的类之一是 transformer，它实现了三种不同的方法：fit()、transform() 和 fit_transform()。我们将探讨它们之间的区别。

引言

在继续之前，让我们回顾一下数据科学项目所遵循的步骤；我们知道，要构建任何数据科学项目，都需要采取特定的行动。我们将在这里简要回顾一下。

我们通过探索性数据分析 (EDA) 来评估数据集，通过这样做，我们揭示了它们至关重要的意义。
利用一些领域专业知识，特征工程是从原始数据中提取特征的过程。
特征选择，当我们决定哪些特征将显著影响模型时。
模型构建，在此步骤中，我们使用适当的技术构建机器学习模型。
部署，我们将机器学习模型发布到在线。

如果我们关注前三个过程，模型开发和模型训练很可能更侧重于数据预处理。因此，每次我们希望启动任何机器学习软件时，这都是一个非常关键的过程。

Sklearn 中的 Transformer

Transformer 是 Scikit-learn 中常用的对象。Transformer 的功能是执行特征转换过程，这是数据预处理的一部分；然而，对于模型训练，我们需要称为模型的对象，例如线性回归、分类等。用于特征选择的类 Transformer-like 对象的一些示例是 StandardScaler、PCA、Imputer、MinMaxScaler 等。我们使用这些工具对原始数据执行一些预处理，例如更改输入数据的格式和特征缩放。此外，这些数据用于模型训练。

我们使用标准化过程，该过程采用特征 F 并将其更改为 F'。通过对 f_1、f_2、f_3 和 f_4 特征使用标准化公式，f_1、f_2、f_3 和 f_4 是自变量特征，f_4 是因变量特征；我们更改这些特征。借助三个不同的操作，我们可以将输入特征 F 转换为另一个输入特征 F'。这些操作是

fit()
transform()
fit_transform()

fit() 方法

在 fit() 方法中，我们将必要的公式应用于输入数据中我们想要更改的特征，并计算结果，然后将结果拟合到 Transformer。我们必须在 Transformer 对象之后使用 .fit() 方法。

如果创建了 StandardScaler 对象 sc，则应用 .fit() 方法将计算特定特征 F 的均值 (µ) 和标准差 (σ)。我们可以稍后使用这些参数进行分析。

让我们以预处理 Transformer StandardScaler 为例，假设我们需要缩放自创建数据的特征。下面的代码示例数据集是使用 arrange 方法创建的，然后分为训练集和测试集。之后，我们创建一个 StandardScaler 实例，并将训练数据的特征拟合到它上面，以确定将来用于缩放的均值和标准差。

必须强调在对数据集进行任何预处理过程（例如缩放）之前将其分离成训练集和测试集的重要性。测试数据点代表真实世界数据。因此，我们只能对训练特征执行 fit()，以防止未来的数据泄露到我们的模型中。

代码

# Python program to show how to use the fit() method of the Transformer class of scikit-learn.
# We will use the fit() method with the feature scaling tool known as StandardScaler. This tool is used for scaling the features using standardization.

# Importing the required modules
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Creating a random dataset with features X and y
X, Y = np.arange(20).reshape((10, 2)), range(10)

# Segregating data into training and testing datasets
X_train, X_test, Y_train, Y_test = train_test_split( X, Y, test_size = 0.30, random_state = 1 )

# Printing the training dataset
print( "Training dataset: \n", X_train)

# Printing the testing dataset
print( "Testing dataset: \n", X_test)

# Calculating the standardizing parameters that are the mean and standard deviation of the X_train dataset.
standard_scaler = StandardScaler()
standard_scaler.fit(X_train)
print(" Prameters of the fit method: \n", standard_scaler.get_params())

输出

Training dataset: 
 [[ 8  9]
 [ 0  1]
 [ 6  7]
 [ 2  3]
 [14 15]
 [16 17]
 [10 11]]
Testing dataset: 
 [[ 4  5]
 [18 19]
 [12 13]]
 Parameters of the fit method: 
 {'copy': True, 'with_mean': True, 'with_std': True}

transform() 方法

要更改数据，我们最有可能使用 transform() 函数，我们在其中对特征 F 中的每个值执行 fit() 的计算。我们转换拟合的计算。因此，在我们应用了 fit 对象之后，我们必须使用 .transform()。

当我们使用 fit 方法创建一个对象时，我们利用上一节的示例，并将该对象放在 . 前面。

使用 transform 和 fit_transform 方法转换数据点的值，我们收到的输出始终是稀疏矩阵或数组。

代码

# Python program to show how to use the transform() method of the Transformer class of scikit-learn.
# We will use the transform() method with the feature scaling tool known as StandardScaler.

# Importing the required modules,
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Createing a random dataset with features X and y
X, Y = np.arange(20).reshape((10, 2)), range(10)

# Segregating data into training and testing datasets
X_train, X_test, Y_train, Y_test = train_test_split( X, Y, test_size = 0.30, random_state = 1 )

# Printing original X_train
print(X_train)
# Calculating the standardizing parameters and transforming the dataset.
standard_scaler = StandardScaler()
fitted = standard_scaler.fit(X_train)
X_train = fitted.transform(X_train)

# Printing X_train after transforming data
print(X_train)

输出

[[ 8  9]
 [ 0  1]
 [ 6  7]
 [ 2  3]
 [14 15]
 [16 17]
 [10 11]]
[[ 0.          0.        ]
 [-1.46759877 -1.46759877]
 [-0.36689969 -0.36689969]
 [-1.10069908 -1.10069908]
 [ 1.10069908  1.10069908]
 [ 1.46759877  1.46759877]
 [ 0.36689969  0.36689969]]

fit_transform() 方法

通过对训练数据应用 fit_transform()，对训练数据进行缩放并确定其缩放参数。在这种情况下，我们创建的模型将发现训练集中特征的均值和方差。

使用 fit 方法计算我们数据中报告的每个特征的均值和方差。transform 方法使用相应的均值和方差转换所有特征。

我们希望在测试数据上实现缩放，但我们也不希望我们的模型产生偏差。我们期望我们的测试数据集对我们的模型来说是全新的、意料之外的。在这种情况下，convert 方法很有用。

代码

# Python program to show how to use the fit_transform() method of the Transformer class of scikit-learn.
# We will use fit_transform() method with the feature scaling tool known as StandardScaler.

# Importing the required modules
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Createing a random dataset with features X and y
X, Y = np.arange(20).reshape((10, 2)), range(10)

# Segregating data into training and testing datasets
X_train, X_test, Y_train, Y_test = train_test_split( X, Y, test_size = 0.30, random_state = 1 )

# Printing original X_train
print(X_train)
# Directly transforming the X_train dataset.
standard_scaler = StandardScaler()
X_train = standard_scaler.fit_transform(X_train)

# Printing X_train after transforming data
print(X_train)

输出

[[ 8  9]
 [ 0  1]
 [ 6  7]
 [ 2  3]
 [14 15]
 [16 17]
 [10 11]]
[[ 0.          0.        ]
 [-1.46759877 -1.46759877]
 [-0.36689969 -0.36689969]
 [-1.10069908 -1.10069908]
 [ 1.10069908  1.10069908]
 [ 1.46759877  1.46759877]
 [ 0.36689969  0.36689969]]

结论

在本教程中，我们探讨了 sklearn 中最常用的三个 Transformer 函数：fit()、transform() 和 fit_transform()。我们研究了每个函数的作用、它们的区别以及在什么情况下应该选择它们。简而言之，fit() 方法将允许我们获取缩放函数的参数。transform() 方法将转换数据集以继续进行进一步的数据分析步骤。fit_transform() 方法将确定参数并转换数据集。

下一主题Python for Finance

Python 中的 fit(), transform() 和 fit_transform() 方法

引言

Sklearn 中的 Transformer

fit() 方法

transform() 方法

fit_transform() 方法

结论

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Python 问题

Python 中的 fit(), transform() 和 fit_transform() 方法

引言

Sklearn 中的 Transformer

fit() 方法

transform() 方法

fit_transform() 方法

结论

相关帖子

Python 中的静态

Django vs. Node JS

Python zlib 库

Python Word2Vec

Python 列表中项的平均值

使用 Python 可视化全球人口数据集

Python 中的缩进错误

Python 中的 Barrier 对象

Python 中的 strftime() 函数

如何在 Python 中从列表中移除元素

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器