索引和选择 Pandas DataFrame

2025年3月17日 | 阅读 7 分钟

Pandas 是 Python 中用于数据分析的最重要的库之一。首先，Pandas 中的 DataFrame 就像一个表格或一个具有行和列的二维数组。它是一个可变的和异构的数据结构。我们将行和列称为轴。

Pandas 中提供了许多函数来操作 DataFrames 以进行分析。我们可以通过多种方式创建 DataFrame，但使用的函数是

要使用此函数或库中的任何函数，首先，我们必须使用以下方法导入库

import pandas as pd
pd.DataFrame()

在本教程中，我们在一个 Excel 表格 "painters.xlsx" 中创建了一个表格，其中包含关于世界上 20 位最伟大的画家的信息。

Indexing and Selecting a Pandas DataFrame

现在，这是一个 Python 代码，用于将此表格创建为 Pandas DataFrame

import pandas as pd
df = pd.read_excel("painters.xlsx", index_col = 0)
print(df)

输出

本教程的标题是“索引和选择 DataFrame”。就像我们使用从 0 到长度 - 1 的索引来切片字符串一样，我们也可以从现有的 DataFrame 访问、复制和创建新的 DataFrames。本教程将解释所有这些方法。

DataFrame[] 和 DataFrame.column
DataFrame.loc[]
DataFrame.iloc[]
head() 和 tail()

1. [] 和 .

[] 称为索引运算符，. 称为属性运算符 在 Pandas 中。这些运算符用于索引的基本形式和查看 DataFrame 的不同子集。

使用属性运算符(.)

选择列

我们只能使用此运算符从 DataFrame 中选择一个单列。它仅限于具有直接引用的列。这意味着如果列的名称包含空格，Python 将无法跟进

在我们的 Painters 表中

import pandas as pd
df = pd.read_excel("painters.xlsx", index_col = 0)
print(df.Birth)
print(df.Greatest Artpiece)

输出

请注意，当我们尝试访问“Greatest Artpiece”列时会引发语法错误，因为有空格。如果我们要访问该属性，可以使用 getattr(DataFrame, column_name) 函数。

Import pandas as pd
df = pd.read_excel("painters.xlsx", index_col = 0)
print(getattr(df, "Greatest Artpiece"))

输出

使用索引运算符

选择列

我们需要将列的名称传递给运算符，但这里对列名中的任何空格没有限制

import pandas as pd
df = pd.read_excel("painters.xlsx", index_col = 0)
print(df["Greatest Artpiece"])

请注意，列的名称必须用引号括起来传递。

输出

此运算符的另一个功能是，我们甚至可以通过将所需列的列表传递给函数来选择多个列

import pandas as pd
df = pd.read_excel("painters.xlsx", index_col = 0)
print(df[["Name", "Nationality"]])

输出

选择行

使用切片运算符，我们可以使用相同的索引运算符选择 DataFrame 的行。切片的语法与任何其他可迭代对象的语法相同

start: 起始索引/行位置（包含）

stop: 停止切片的位置（不包含）

step: 选择行之间的间隔

import pandas as pd
cols = [0, 1, 2, 3, 4]
df = pd.read_excel("D:\Internships\JavaTpoint\October-new pos\painters.xlsx", index_col = 0)
print(df[1: 4])

输出

如果我们在创建时使用行标签，我们也可以使用它们。这是一个例子

import pandas as pd
dictionary = {"Name": ["Harry", "Zayn", "Niall"], "Age": [28, 28, 29]}
df1 = pd.DataFrame(dictionary, index = ["Member 1", "Member 2", "Member 3"])
print(df1)
print()
print(df1[0: 2])
print()
print(df1["Member 1": "Member 3"])

输出

请注意，还打印了 Member 3 的行。当我们使用位置进行切片时，结束位置是不包含的，但是当我们使用行标签时，最后一行是包含的。

以下是关于索引运算符的结论要点

我们可以使用 [] 从 DataFrame 中选择行和列。
选择列时，我们可以选择单列或多列。
当我们使用切片运算符时，它将选择行
我们可以使用位置或 row_labels 对行进行切片。当我们使用位置时，不会选择最后一行，但是当我们使用 row_labels 时，会选择最后一行。

到目前为止，我们无法同时选择 DataFrame 的行和列。 Pandas 中有两个专门用于选择和子集 DataFrames 的函数。这些函数具有清晰的功能。我们现在将了解它们。

2. DataFrame.iloc

语法

行和列都必须是位置而不是标签，并且这些位置可以按如下方式给出

单个位置
多个位置的列表
位置切片

这是我们将要修改的表格

请注意，第 0 行和第 0 列分别称为第 1 行和第 1 列。

单个位置

语法

代码

import pandas as pd
cols = [0, 1, 2, 3, 4]
df = pd.read_excel("D:\Internships\JavaTpoint\October-new pos\painters.xlsx", index_col = 0)
print("The value in 2nd row and 2nd column:")
print(df.iloc[1, 1]) #0th-1st, 1st - 2nd 

输出

位置列表

语法

代码

import pandas as pd
cols = [0, 1, 2, 3, 4]
df = pd.read_excel("D:\Internships\JavaTpoint\October-new pos\painters.xlsx", index_col = 0)
print("First three rows and columns:")
print(df.iloc[[0, 1, 2], [0, 1, 2]])

输出

单个位置和位置列表的组合

语法

DataFrame.iloc[row_position, [c1, c2...]]  #Single row, multiple columns
DataFrame.iloc[[r1, r2...], column_position] #Multiple rows and single column

代码

import pandas as pd
cols = [0, 1, 2, 3, 4]
df = pd.read_excel("D:\Internships\JavaTpoint\October-new pos\painters.xlsx", index_col = 0)
print("Values in first two columns in 2nd row:")
print(df.iloc[1, [0, 1]])
print()
print("Values in first two rows in 2nd column:")
print(df.iloc[[0, 1], 1])

输出

切片

语法

代码

import pandas as pd
cols = [0, 1, 2, 3, 4]
df = pd.read_excel("D:\Internships\JavaTpoint\October-new pos\painters.xlsx", index_col = 0)
print("Values in first row:")
print(df.iloc[0, ::])
print()
print("Values in first column:")
print(df.iloc[::, 0])
print()
print("Values in 2, 3, 4 rows and 3, 4, 5 columns:")
print(df.iloc[1: 3, 2: 4])
print()
print("Values in even rows and even columns:")
print(df.iloc[1::2, 1::2])

输出

3. DataFrame.loc[rows, columns]

正如我们在上面看到的，iloc[] 在位置上工作，而不是在标签上。相反，loc[] 在标签上工作，而不是在位置上。所有其他功能都相同。

行和列都必须是标签，并且这些标签可以按如下方式给出

单个行或列标签
多个标签的列表
标签切片

注意：在行或列标签上使用切片运算符时，结束标签将与起始标签一起包含在内，就像我们使用索引运算符 - [] 进行切片一样

这是我们将要修改的表格

单个标签

语法

代码

import pandas as pd
cols = [0, 1, 2, 3, 4]
df = pd.read_excel("D:\Internships\JavaTpoint\October-new pos\painters.xlsx", index_col = 0)
print("The value in 2nd row and 2nd column:")
print(df.loc[1, "Birth"])

输出

位置列表

语法

代码

import pandas as pd
cols = [0, 1, 2, 3, 4]
df = pd.read_excel("D:\Internships\JavaTpoint\October-new pos\painters.xlsx", index_col = 0)
print("First three rows and columns:")
print(df.loc[[0, 1, 2], ["Name", "Birth", "Death"]])

输出

单个位置和位置列表的组合

语法

DataFrame.iloc[row_label, [c1, c2...]]  #Single row, multiple columns
DataFrame.iloc[[r1, r2...], column_label] #Multiple rows and single column

代码

import pandas as pd
cols = [0, 1, 2, 3, 4]
df = pd.read_excel("D:\Internships\JavaTpoint\October-new pos\painters.xlsx", index_col = 0)
print("Values in first two columns in 2nd row:")
print(df.loc[1, ["Name", "Birth"]])
print()
print("Values in first two rows in 2nd column:")
print(df.loc[[0, 1], "Birth"])

输出

切片

语法

代码

import pandas as pd
cols = [0, 1, 2, 3, 4]
df = pd.read_excel("D:\Internships\JavaTpoint\October-new pos\painters.xlsx", index_col = 0)
print("Values in first row:")
print(df.loc[0, ::])
print()
print("Values in first column:")
print(df.loc[::, "Name"])
print()
print("Values in 2, 3, 4 rows and 3, 4, 5 columns:")
print(df.loc[1: 3, "Death": "Nationality"])
print()
print("Values in even rows and even columns:")
print(df.loc[1::2, "Birth"::2])

输出

观察当我们给出

行：第 1 行、第 2 行和第 3 行

列：打印了“死亡”、“最伟大的艺术品”和“国籍”，这意味着还包括最后一行和最后一列。

带条件

到目前为止，我们使用位置编号或标签从 DataFrame 中选择数据。我们还可以根据我们需要使用的两种方式 - loc[] 和索引运算符来选择数据

以下是一些重要的要点

1. 我们可以使用任何布尔运算符，但在这里，我们必须使用

& for and operation
| for or operation
~ for not operation

2. 我们可以使用任意数量的条件，但每个条件都必须用括号括起来。

import pandas as pd
cols = [0, 1, 2, 3, 4]
df = pd.read_excel("D:\Internships\JavaTpoint\October-new pos\painters.xlsx", index_col = 0)
print("American or French painter born after 1800:")
print(df[(df["Birth"]>1800) & ((df["Nationality"]=="American") | (df["Nationality"]=="French"))])

输出

3. 假设我们要打印所有出生于 1800 年之后的画家。我们需要检查“出生”列

df["出生"]>1800

这就是条件。如果我们在检查条件后打印它，我们将获得具有 True 和 False 的列。现在，如果我们想打印行，我们需要将条件传递给 df[]:

df[df["Birth"]>1800]
print("Painters born after 1800:")
print(df["Birth"]>1800)
print()
print(df[df["Birth"]>1800])

输出

4. 使用 loc[]，我们可以直接将条件传递给运算符，就像我们传递给 df[] 一样。我们可以通过使用 loc[] 获得的额外优势是，我们可以使用切片选择列。

这里有一个例子

import pandas as pd
cols = [0, 1, 2, 3, 4]
df = pd.read_excel("D:\Internships\JavaTpoint\October-new pos\painters.xlsx", index_col = 0)
print("Using loc():")
print("Painters born after 1800:")
print(df.loc[(df["Birth"]>1800)])
print("Selecting a few columns:")
print(df.loc[(df["Birth"]>1800), "Name": "Birth"])

输出

4. head() 和 tail()

这两种方法主要用于从大量数据中查看数据样本。 Head() 用于从开头获取样本，tail() 用于从结尾获取样本。

如果我们不传递任何参数，head() 打印 DataFrame 的前五行，而 tail() 打印 DataFrame 的最后五行。我们可以通过提及我们需要的行数来传递一个参数。

语法

DataFrame.head(number of rows)
DataFrame.tail(number of rows)

代码

import pandas as pd
cols = [0, 1, 2, 3, 4]
df = pd.read_excel("D:\Internships\JavaTpoint\October-new pos\painters.xlsx", index_col = 0)
print("First five rows: ")
df1 = df.head()
print(df1)
print("\nFirst 3 rows: ")
print(df.head(3))
print("\nLast five rows: ")
df2 = df.tail()
print(df2)
print("\nLast 3 rows: ")
print(df.tail(3))

输出

下一个主题如何在 Pandas 中删除行

索引和选择 Pandas DataFrame

1. [] 和 .

使用索引运算符

2. DataFrame.iloc

3. DataFrame.loc[rows, columns]

注意：在行或列标签上使用切片运算符时，结束标签将与起始标签一起包含在内，就像我们使用索引运算符 - [] 进行切片一样

4. head() 和 tail()

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Pandas 教程

Pandas Series

Pandas DataFrame

Pandas 操作

数据操作

Pandas 速查表

Pandas 索引

Pandas NumPy

Pandas 时间序列

Pandas 绘图

杂项。

面试题

索引和选择 Pandas DataFrame

1. [] 和 .

使用索引运算符

2. DataFrame.iloc

3. DataFrame.loc[rows, columns]

注意：在行或列标签上使用切片运算符时，结束标签将与起始标签一起包含在内，就像我们使用索引运算符 - [] 进行切片一样

4. head() 和 tail()

相关帖子

Pandas 排序方法

Pandas - 从整个 DataFrame 中删除空格

在 pandas 中删除列

在 Pandas 中应用 If-Else 条件语句的 5 种方法

与 Groupby 一起使用的 3 个鲜为人知的 Pandas 函数

帮助您掌握 Pandas 库的 5 本书

如何在 Pandas 中删除行

使用 Pandas 通过重叠窗口技术识别时间序列数据中的模式

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器