Python Pandas DataFrame

2025年03月17日 | 阅读 9 分钟

Pandas DataFrame 是一个广泛使用的数据结构，它处理带有已标记轴（行和列）的二维数组。DataFrame 被定义为一种存储具有两个不同索引（即行索引和列索引）的数据的标准方式。它包含以下属性：

列可以是异构类型，例如 int、bool 等。
它可以看作是 Series 的字典结构，其中行和列都带有索引。对于列，它表示为“columns”，对于行，它表示为“index”。

参数 & 描述

data: 它包含多种形式，如 ndarray、series、map、constants、lists、array。

index: 如果未传递索引，则默认使用 np.arrange(n) 索引作为行标签。

columns: 如果未传递索引，则列标签的默认语法为 np.arrange(n)。它仅显示 true。

dtype: 它指的是每列的数据类型。

copy(): 用于复制数据。

创建 DataFrame

我们可以通过以下方式创建 DataFrame：

dict
列表
Numpy ndarrrays
Series

创建一个空的 DataFrame

下面的代码展示了如何在 Pandas 中创建一个空的 DataFrame：

# importing the pandas library
import pandas as pd
df = pd.DataFrame()
print (df)

输出

Empty DataFrame
Columns: []
Index: []

解释：在上面的代码中，首先，我们使用别名pd导入了 pandas 库，然后定义了一个名为df的变量，该变量包含一个空的 DataFrame。最后，我们通过将df传递到print函数来打印它。

使用列表创建 DataFrame

我们可以轻松地在 Pandas 中使用列表创建 DataFrame。

# importing the pandas library
import pandas as pd
# a list of strings
x = ['Python', 'Pandas']

# Calling DataFrame constructor on list
df = pd.DataFrame(x)
print(df)

输出

      0
0   Python
1   Pandas

解释：在上面的代码中，我们定义了一个名为“x”的变量，它包含字符串值。DataFrame 构造函数被调用以列表形式打印这些值。

从 ndarrays/ Lists 的字典创建 DataFrame

# importing the pandas library
import pandas as pd
info = {'ID' :[101, 102, 103],'Department' :['B.Sc','B.Tech','M.Tech',]}
df = pd.DataFrame(info)
print (df)

输出

       ID      Department
0      101        B.Sc
1      102        B.Tech
2      103        M.Tech

解释：在上面的代码中，我们定义了一个名为“info”的字典，它包含ID和Department的列表。要打印值，我们需要通过一个名为df的变量调用 info 字典，并将其作为参数传递给print()。

从 Series 的字典创建 DataFrame

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])}

d1 = pd.DataFrame(info)
print (d1)

输出

        one         two
a       1.0          1
b       2.0          2
c       3.0          3
d       4.0          4
e       5.0          5
f       6.0          6
g       NaN          7
h       NaN          8

解释：在上面的代码中，一个名为“info”的字典包含两个Series及其各自的索引。要打印值，我们需要通过一个名为d1的变量调用info字典，并将其作为参数传递给print()。

列选择

我们可以从 DataFrame 中选择任何列。下面的代码演示了如何从 DataFrame 中选择一列：

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])}

d1 = pd.DataFrame(info)
print (d1 ['one'])

输出

a      1.0
b      2.0
c      3.0
d      4.0
e      5.0
f      6.0
g      NaN
h      NaN
Name: one, dtype: float64

解释：在上面的代码中，一个名为“info”的字典包含两个Series及其各自的索引。稍后，我们通过一个名为d1的变量调用了info字典，并通过将“one” Series 传递给print()来从 DataFrame 中选择它。

列添加

我们也可以向现有 DataFrame 添加新列。下面的代码演示了如何向现有 DataFrame 添加新列：

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}

df = pd.DataFrame(info)

# Add a new column to an existing DataFrame object 

print ("Add new column by passing series")
df['three']=pd.Series([20,40,60],index=['a','b','c'])
print (df)

print ("Add new column using existing DataFrame columns")
df['four']=df['one']+df['three']

print (df)

输出

Add new column by passing series
      one     two      three
a     1.0      1        20.0
b     2.0      2        40.0
c     3.0      3        60.0
d     4.0      4        NaN
e     5.0      5        NaN
f     NaN      6        NaN

Add new column using existing DataFrame columns
       one      two       three      four
a      1.0       1         20.0      21.0
b      2.0       2         40.0      42.0
c      3.0       3         60.0      63.0
d      4.0       4         NaN      NaN
e      5.0       5         NaN      NaN
f      NaN       6         NaN      NaN

解释：在上面的代码中，一个名为f的字典包含两个Series及其各自的索引。稍后，我们通过一个名为df的变量调用了info字典。

要向现有的 DataFrame 对象添加新列，我们传递了一个包含与其索引相关的某些值的新 Series，并通过print()打印了其结果。

我们可以使用现有的 DataFrame 添加新列。“four”列已被添加，它存储了两个列（即one和three）相加的结果。

列删除

我们也可以从现有 DataFrame 中删除任何列。此代码有助于演示如何从现有 DataFrame 中删除列：

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2], index= ['a', 'b']), 
   'two' : pd.Series([1, 2, 3], index=['a', 'b', 'c'])}
   
df = pd.DataFrame(info)
print ("The DataFrame:")
print (df)

# using del function
print ("Delete the first column:")
del df['one']
print (df)
# using pop function
print ("Delete the another column:")
df.pop('two')
print (df)

输出

The DataFrame:
      one    two
a     1.0     1
b     2.0     2
c     NaN     3

Delete the first column:
     two
a     1
b     2
c     3

Delete the another column:
Empty DataFrame
Columns: []
Index: [a, b, c]

说明

在上面的代码中，df变量负责调用info字典并打印字典的全部值。我们可以使用delete或pop函数从 DataFrame 中删除列。

在第一种情况下，我们使用delete函数从 DataFrame 中删除了“one”列；而在第二种情况下，我们使用pop函数从 DataFrame 中删除了“two”列。

行选择、添加和删除

行选择

我们可以随时轻松地选择、添加或删除任何行。首先，我们将理解行选择。让我们看看如何通过以下几种方式选择一行：

按标签选择

我们可以通过将行标签传递给loc函数来选择任何行。

# importing the pandas library
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']), 
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}

df = pd.DataFrame(info)
print (df.loc['b'])

输出

one    2.0
two    2.0
Name: b, dtype: float64

解释：在上面的代码中，一个名为info的字典包含两个Series及其各自的索引。

为了选择一行，我们将行标签传递给了loc函数。

按整数位置选择

也可以通过将整数位置传递给iloc函数来选择行。

# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print (df.iloc[3])

输出

one    4.0
two    4.0
Name: d, dtype: float64

解释：解释：在上面的代码中，我们定义了一个名为info的字典，它包含两个Series及其各自的索引。

为了选择一行，我们将整数位置传递给了iloc函数。

切片行

这是使用':'运算符选择多行的另一种方法。

# importing the pandas library
import pandas as pd
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']), 
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = pd.DataFrame(info)
print (df[2:5])

输出

      one    two
c     3.0     3
d     4.0     4
e     5.0     5

解释：在上面的代码中，我们定义了从 2:5 的范围用于行选择，然后将其值打印到控制台。

添加行

我们可以使用append函数轻松地向 DataFrame 添加新行。它会在末尾添加新行。

# importing the pandas library
import pandas as pd
d = pd.DataFrame([[7, 8], [9, 10]], columns = ['x','y'])
d2 = pd.DataFrame([[11, 12], [13, 14]], columns = ['x','y'])
d = d.append(d2)
print (d)

输出

      x      y
0     7      8
1     9      10
0     11     12
1     13     14

解释：在上面的代码中，我们定义了两个单独的列表，其中包含一些行和列。这些列已通过append函数添加，然后将结果显示在控制台上。

删除行

我们可以使用index标签从 DataFrame 中删除或丢弃任何行。如果标签重复，则会删除多行。

# importing the pandas library
import pandas as pd

a_info = pd.DataFrame([[4, 5], [6, 7]], columns = ['x','y'])
b_info = pd.DataFrame([[8, 9], [10, 11]], columns = ['x','y'])

a_info = a_info.append(b_info)

# Drop rows with label 0
a_info = a_info.drop(0)

输出

x      y
1     6      7
1     10    11

解释：在上面的代码中，我们定义了两个单独的列表，其中包含一些行和列。

在这里，我们定义了需要从列表中删除的行的索引标签。

DataFrame 函数

DataFrame 中有很多函数，它们如下：

函数	描述
Pandas DataFrame.append()	将其他 DataFrame 的行添加到给定 DataFrame 的末尾。
Pandas DataFrame.apply()	允许用户传递一个函数并将其应用于 Pandas Series 的每个单独值。
Pandas DataFrame.assign()	向 DataFrame 添加新列。
Pandas DataFrame.astype()	将 Pandas 对象转换为指定的 dtype。astype() 函数。
Pandas DataFrame.concat()	在 DataFrame 中沿某个轴执行连接操作。
Pandas DataFrame.count()	计算每列或每行的非 NA 单元格的数量。
Pandas DataFrame.describe()	计算 Series 或 DataFrame 数值数据的某些统计数据，如百分位数、均值和标准差。
Pandas DataFrame.drop_duplicates()	删除 DataFrame 中的重复值。
Pandas DataFrame.groupby()	将数据拆分为各种组。
Pandas DataFrame.head()	根据位置返回对象的头 n 行。
Pandas DataFrame.hist()	将数值变量中的值分成“bins”（区间）。
Pandas DataFrame.iterrows()	迭代行的 (index, series) 对。
Pandas DataFrame.mean()	返回请求轴上值的均值。
Pandas DataFrame.melt()	将 DataFrame 从宽格式“解透”为长格式。
Pandas DataFrame.merge()	将两个数据集合并为一个。
Pandas DataFrame.pivot_table()	使用总和、计数、平均值、最大值和最小值等计算来聚合数据。
Pandas DataFrame.query()	过滤 DataFrame。
Pandas DataFrame.sample()	随机选择 DataFrame 中的行和列。
Pandas DataFrame.shift()	移位列或用 DataFrame 中前一行的值减去列值。
Pandas DataFrame.sort()	对 DataFrame 进行排序。
Pandas DataFrame.sum()	返回用户请求轴上值的总和。
Pandas DataFrame.to_excel()	将 DataFrame 导出到 Excel 文件。
Pandas DataFrame.transpose()	转置 DataFrame 的索引和列。
Pandas DataFrame.where()	检查 DataFrame 是否满足一个或多个条件。

下一主题DataFrame.append()

Python Pandas DataFrame

参数 & 描述

创建 DataFrame

使用列表创建 DataFrame

从 ndarrays/ Lists 的字典创建 DataFrame

从 Series 的字典创建 DataFrame

列选择

列添加

列删除

行选择、添加和删除

行选择

DataFrame 函数

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Pandas 教程

Pandas Series

Pandas DataFrame

Pandas 操作

数据操作

Pandas 速查表

Pandas 索引

Pandas NumPy

Pandas 时间序列

Pandas 绘图

杂项。

面试题

Python Pandas DataFrame

参数 & 描述

创建 DataFrame

使用列表创建 DataFrame

从 ndarrays/ Lists 的字典创建 DataFrame

从 Series 的字典创建 DataFrame

列选择

列添加

列删除

行选择、添加和删除

行选择

DataFrame 函数

相关帖子

DataFrame.iterrows()

DataFrame.sample()

DataFrame.apply()

Pandas DataFrame.query()

DataFrame.append()

DataFrame.join()

DataFrame.groupby()

DataFrame.assign()

DataFrame.to_excel()

DataFrame.hist()

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器