Pandas中的流水线

2025年1月5日 | 阅读6分钟

在 Pandas 中，当我们需要转换 DataFrame 的完整数据时，管道（pipelines）非常重要。它可以帮助我们轻松地处理大量数据。通常来说，当我们有一系列操作需要按顺序执行以获得最终所需结果时，就会使用管道。我们可以通过定义几个函数并将 DataFrame 按顺序通过这些函数来创建自己的管道。使用 Pandas DataFrame 的 `.pipe()` 方法可以简化这个管道操作的任务。

pipe() 方法帮助我们在单行代码中同时调用多个函数并处理我们的数据。为了理解 pipe() 方法的功能，让我们先理解一下操作管道的含义。我们将看到一个管道的示例，然后使用 .pipe() 方法来简化这个过程。

下面是 DataFrame 操作管道的 Python 代码。

代码

# Python program to show how to build a pipeline of operations

# Importing the required modules
import pandas as pd

# Firstly, we need to create a dataframe to apply our operations
df = pd.DataFrame()

# Adding data to the dataframe
df['Artists'] = ['Harry', 'Naill', 'Louis', 'Zayn', 'Liam', 'Peter', 'Andrew']
df['Role'] = ['Singer', 'Musician', 'Lyricist', 'Singer', 'Composer', 'Actor', 'Actor']
df['Age'] = [31, 33, 32, 33, 32, 34, 34]

# Printing the original dataframe
print("Original Dataframe: \n", df)

# Creating the first operation of the pipeline
# This function will find the mean of the data
def mean_(df, col):
  
  # We will find the mean of the ages of the artists using the built-in mean method of the dataframe
  return df.col.mean()
  
# Creating the second operation of the pipeline
# This function will convert text into uppercase
def uppercase_(df):
  
  # We will convert the name of the columns in upper case
  df.columns = df.columns.str.upper()
   
  # Returning the transformed dataframe
  return df

输出

Original Dataframe: 
   Artists      Role  Age
0   Harry    Singer   31
1   Naill  Musician   33
2   Louis  Lyricist   32
3    Zayn    Singer   33
4    Liam  Composer   32
5   Peter     Actor   34
6  Andrew     Actor   34

我们将使用 .pipe() 方法来实现这个管道

代码

# Python program to implement the pipeline created above using the pipe() method of the dataframe
pipeline = df.pipe(mean_, col = 'Age').pipe(uppercase_)
print(pipeline)

输出

  ARTISTS      ROLE        AGE
0   Harry    Singer  32.714286
1   Naill  Musician  32.714286
2   Louis  Lyricist  32.714286
3    Zayn    Singer  32.714286
4    Liam  Composer  32.714286
5   Peter     Actor  32.714286
6  Andrew     Actor  32.714286

现在，我们将使用 Python 的 pdpipe 包在 Pandas DataFrame 上实现管道。pdpipe 易于使用，并提供了清晰的接口来为 Pandas DataFrame 构建管道。Python 的 pdpipe 包用于预处理为 Pandas DataFrame 创建的管道。Pdpipe 是一个更高效的工具，可以在几行代码中构建复杂的管道。

在使用 pdpipe 包之前，我们需要在 Python 环境中安装它。我们将使用以下 pip 命令安装这个包

一旦包被下载，我们就可以按照下面的示例使用这个包。

下面是使用 pdpipe 包实现管道的 Python 代码

代码

# Python program to show how to build a pipeline of operations

# Importing the required modules
import pandas as pd

# Firstly, we need to create a dataframe to apply our operations
df = pd.DataFrame()

# Adding data to the dataframe
df['Artists'] = ['Harry', 'Naill', 'Louis', 'Zayn', 'Liam', 'Peter', 'Andrew']
df['Role'] = ['Singer', 'Musician', 'Lyricist', 'Singer', 'Composer', 'Actor', 'Actor']
df['Age'] = [31, 33, 32, 33, 32, 34, 34]
df['State'] = ['NY', 'Cal', 'NL', 'BP', 'CL', 'NY', 'Cal']
df['idx'] = [1, 2, 3, 4, 5, 6, 7]

# Printing the original dataframe
print("Original Dataframe: \n", df)

输出

Original Dataframe: 
   Artists      Role  Age State  idx
0   Harry    Singer   31    NY    1
1   Naill  Musician   33   Cal    2
2   Louis  Lyricist   32    NL    3
3    Zayn    Singer   33    BP    4
4    Liam  Composer   32    CL    5
5   Peter     Actor   34    NY    6
6  Andrew     Actor   34   Cal    7

现在，我们将创建一个管道来删除 DataFrame 中不需要的列。我们将使用 pdpipe 包来删除该列。

下面是展示如何实现的 Python 代码

代码

# Python program to show how to create a pipeline to drop a column of the dataframe using the pdpipe package

# Importing the required modules
import pdpipe as pdp

# We will create a pipeline to drop an unwanted column of our dataframe
# We will use the ColDrop method of the pdpipe package to drop th column
pipe = pdp.ColDrop("idx")

# Implementing the pipeline on the dataframe
drop = pipe.apply(df)

# Printing the new dataframe after implementing the pipeline to the dataframe
print("New dataframe: \n", drop)

输出

New dataframe: 
   Artists      Role  Age State
0   Harry    Singer   31    NY
1   Naill  Musician   33   Cal
2   Louis  Lyricist   32    NL
3    Zayn    Singer   33    BP
4    Liam  Composer   32    CL
5   Peter     Actor   34    NY
6  Andrew     Actor   34   Cal

pdpipe 包还包含另一种实现 DataFrame 管道的方法。让我们看看第二种方法。

代码

# Python program to show how to create a pipeline to drop a column of the dataframe using the pdpipe package

# Importing the required modules
import pdpipe as pdp

# We will create a pipeline to drop an unwanted column of our dataframe
pipe = pdp.ColDrop("idx")

# Implemening our pipeline to the dataframe we created above
drop = pipe(df)

# Printing the new dataframe after implementing the pipeline to the dataframe
print("New dataframe: \n", drop)

输出

New dataframe: 
   Artists      Role  Age State
0   Harry    Singer   31    NY
1   Naill  Musician   33   Cal
2   Louis  Lyricist   32    NL
3    Zayn    Singer   33    BP
4    Liam  Composer   32    CL
5   Peter     Actor   34    NY
6  Andrew     Actor   34   Cal

在上述两种实现 DataFrame 管道的方法中，实现过程分两步。第一步是创建管道。第二步是将管道应用于我们的 DataFrame。

我们已经看到了如何删除列，但是如果我们必须添加一列呢？让我们看看如何使用 pdpipe 包向 DataFrame 添加一列。

使用 Pdpipe 包向 DataFrame 添加列

下面是使用 pdpipe 包向 DataFrame 添加列的 Python 代码。

代码

# Python program to show how to add a column to the dataframe using the pdpipe package

# Importing the required modules
import pandas as pd

# Firstly, we need to create a dataframe to apply our operations
df = pd.DataFrame()

# Adding data to the dataframe
df['Artists'] = ['Harry', 'Naill', 'Louis', 'Zayn', 'Liam', 'Peter', 'Andrew']
df['Role'] = ['Singer', 'Musician', 'Lyricist', 'Singer', 'Composer', 'Actor', 'Actor']
df['Age'] = [31, 33, 32, 33, 32, 34, 34]
df['State'] = ['NY', 'Cal', 'NL', 'BP', 'CL', 'NY', 'Cal']
df['idx'] = [1, 2, 3, 4, 5, 6, 7]

# Printing the original dataframe
print("Original Dataframe: \n", df)

# Creating a pipeline using the ValDrop function of the pdpipe package to drop a value from the dataset
pipe = pdp.ValDrop(['Actor'],'Role')

# Implementing the pipeline on the dataframe
drop = pipe.apply(df)

# Printing the new dataframe after implementing the pipeline to the dataframe
print("New dataframe: \n", drop)

输出

Original Dataframe: 
   Artists      Role  Age State  idx
0   Harry    Singer   31    NY    1
1   Naill  Musician   33   Cal    2
2   Louis  Lyricist   32    NL    3
3    Zayn    Singer   33    BP    4
4    Liam  Composer   32    CL    5
5   Peter     Actor   34    NY    6
6  Andrew     Actor   34   Cal    7
New dataframe: 
   Artists      Role  Age State  idx
0   Harry    Singer   31    NY    1
1   Naill  Musician   33   Cal    2
2   Louis  Lyricist   32    NL    3
3    Zayn    Singer   33    BP    4
4    Liam  Composer   32    CL    5

我们已经看到了实现 Pandas DataFrame 管道的两种不同方法。我们可以使用 Pandas 模块内置的 pipe() 方法。这个函数将用户定义的管道的实现减少到一到两行代码。第二种方法是使用 pdpipe 包。这个包为 Pandas DataFrame 提供了内置管道。我们无需从头开始创建管道。

下一主题Python-solution-of-aggressive-cows-problem

Pandas中的流水线

使用 Pdpipe 包向 DataFrame 添加列

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

其他

Pandas中的流水线

使用 Pdpipe 包向 DataFrame 添加列

相关帖子

Python中的数据挖掘算法

如何在Matplotlib中绘制平滑曲线

Python程序：硬币找零

Python中的历史股票价格数据

PUT方法()-Python Request

如何使用easy_install安装Python模块

Python中的并行化

Python中的First Fit算法

Python BeautifulSoup - find_all Class

Python中排序Counter的不同方法

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器