如何使用 Pandas 将 Excel 文件导入 Python？

2025 年 1 月 5 日 | 阅读 9 分钟

Pandas 概述

Pandas 是一个著名的开源信息控制和剖析库，用于 Python。它供给高效存储和操控大数据的数据结构，以及无缝处理结构化数据的工具。Pandas 的主要数据结构是 Series 和 Data Frame。

Pandas：正在讨论的库。
流行的开源数据操纵和分析库，用于 Python： Pandas 被广泛使用，并且是开源的，这意味着它的源代码可以自由地供任何人查看、修改和分发。
用于高效存储和处理大型数据集的数据结构： Pandas 提供了像 Series 和 Data Frame 这样的高效数据结构，它们经过优化，可以有效地处理大型数据集，使其适用于数据操纵和分析任务。
用于无缝处理结构化数据的工具： Pandas 提供了各种工具和函数来处理结构化数据，允许用户轻松执行数据清理、转换、聚合和分析等任务。
Series 和 Data Frame：这些是 Pandas 中的主要数据结构。Series 是一维带标签的数组，可以容纳任何数据类型，而 Data Frame 是一个二维带标签的数据结构，具有可能包含不同数据类型的列。Series 和 Data Frame 都是 Pandas 中数据操纵和分析的基础。

Excel 文件处理的重要性

Excel 文件在存储结构化数据方面的普及性

Excel 文件长期以来一直是存储结构化数据的标准，从简单的列表到复杂的数据集。它们提供了一个用户友好的界面，并且在金融、商业和研究等各个行业中被广泛使用。

Pandas 作为无缝集成 Excel 数据到 Python 的解决方案

Pandas 简化了将 Excel 数据集成到 Python 工作流的过程，在电子表格世界和 Python 提供的广泛数据分析功能之间架起了一座桥梁。这种集成对于需要利用 Python 功能同时处理 Excel 格式数据的科学和分析师至关重要。

安装 Pandas

前提条件

Python 安装

在安装 Pandas 之前，在您的系统上安装 Python 至关重要。Python 是一种用途广泛的编程语言，在数据科学、机器学习和其他领域被广泛使用。如果您没有安装 Python，请按照以下步骤操作。

下载并安装 Python

访问官方 Python 网站。
为您的操作系统（Windows、macOS 或 Linux）下载最新版本的 Python。
运行安装程序并按照安装说明进行操作。

验证 Python 安装

打开命令提示符或终端。
键入 python --version 或 python -V，然后按 Enter。
确保已显示已安装的 Python 版本，并且没有错误。

安装过程

使用 pip 安装 Pandas

Pip 是 Python 的包安装程序，它简化了 Python 库的安装和管理过程。安装 Python 后，按照以下步骤安装 Pandas。

打开命令提示符或终端

在 Windows 上打开命令提示符，或在 macOS/Linux 上打开终端。

运行以下命令

键入以下命令并按 Enter 安装 Pandas。

此命令指示 pip 下载并安装 Pandas 库及其依赖项。

确认 Pandas 安装

安装完成后，您可以通过键入以下命令进行确认：

这应该会打印已安装的 Pandas 版本，而不会出现任何错误。

其他安装方法

使用 Anaconda

如果您使用的是 Anaconda 发行版，则可以使用以下命令安装 Pandas：

Anaconda 发行版提供了一个全面的数据科学平台，并且包含 Pandas 以及其他流行的库。

使用 Pandas 读取 Excel 文件基础知识

在本节中，我们将深入探讨使用 Pandas 将 Excel 文件读入 Python 的关键过程。Pandas 的 read_excel() 函数是完成此任务的门户，它提供了一种直接的方法来将 Excel 数据加载到 Pandas Data Frame 中。

read_excel() 函数简介

read_excel() 函数是 Pandas 的核心组件，专门用于从 Excel 文件读取数据。它提供了各种参数，允许用户根据 Excel 文件的格式自定义读取过程。

将数据加载到 Data Frame 中

指定 Excel 文件路径

在读取 Excel 文件之前，了解文件的位置至关重要。文件路径将作为 read_excel() 函数的输入参数。

import pandas as pd
# Determine the way to your Succeed record
excel_file_path = 'way/to/your/succeed/file.xlsx'

将“/path/to/your/excel/file.xlsx”替换为您的 Excel 文件的实际路径。

从 Excel 数据创建 Pandas Data Frame (df)

指定路径后，使用 read_excel() 函数创建 Pandas Data Frame。

# Read the Succeed document into a Data Frame
df = pd.read_excel(excel_file_path)

此时，Excel 文件中的数据已存储在 df Data Frame 中，允许您使用 Pandas 功能进行探索和操作。

为了使用 Pandas 将 Excel 文件导入 Python，我们需要使用 pandas.read_excel() 函数。

语法

假设 Excel 文件如下所示

How to Import an Excel File into Python Using Pandas

示例

# Simple program to read the excel file using the python code
import pandas as pd    # here, we are importing the pandas library as pd
df = pd.read_excel("s1.xlsx") 
print(df)       # here, we are printing the excel data

输出

示例 1

# Simple program to select a particular column from the excel file using the python code
import pandas as pd    # here, we are importing the pandas library as pd
df = pd.read_excel("s1.xlsx", ind_col = 0)        # here, the 0th column will be extracted 
print(df)       # here, we are printing the excel data

输出

示例 2

# Simple program to change the header if we have not specified the initial heading of the column using the python code
import pandas as pd    # here, we are importing the pandas library as pd
df = pd.read_excel("s1.xlsx", header = None)        # here, we are declaring the header parameter to none value
print(df)       # here, we are printing the excel data

输出

示例 3

# Simple program to change the data type of a particular column you can do it using the parameter "dtype" using the python code
import pandas as pd    # here, we are importing the pandas library as pd
df = pd.read_excel("s1.xlsx", dtype = {"Products": str, 
                            "Price":float})        # here, we are selecting the products and price column from the excel sheet
print(df)       # here, we are printing the excel data

输出

示例 4

# Simple program if we have any unknown values in the sheet then we can handle them using the na_values. This will convert all the unknown to NaN. 
import pandas as pd    # here, we are importing the pandas library as pd
df = pd.read_excel("s1.xlsx", na_values =['item1', 'item2'])        
print(df)       # here, we are printing the excel data

输出

使用 Pandas 处理多个工作表

在许多 Excel 文件中，数据会分布在多个工作表中，每个工作表可能包含不同的信息。Pandas 提供了处理此类情况的功能，允许用户读取特定工作表并从大型工作簿中提取相关数据。

多工作表的重要性

理解具有多个工作表的 Excel 文件的结构对于提取目标信息至关重要。每个工作表都可以代表整个数据集的不同部分，Pandas 在选择要读取的工作表方面提供了灵活性。

使用 sheet_name 参数指定工作表名称

read_excel() 函数包含 sheet_name 参数，允许用户指定要读取的工作表。此参数接受各种输入，从而在提取数据方面提供了灵活性。

从特定工作表中提取数据

要从特定工作表中读取数据，只需将工作表名称作为参数即可。

# Indicate the sheet name
sheet_name = 'Sheet1'
# Read the Excel record with the predefined sheet name into a Data Frame
df = pd.read_excel(excel_file_path, sheet_name=sheet_name)

输出

将“Sheet1”替换为您要读取的工作表的实际名称。此方法允许从特定工作表中提取数据，从而简化了分析过程。

在大型工作簿中定位相关工作表的灵活性

对于具有多个工作簿的工作簿，Pandas 提供了同时读取多个工作簿的选项。sheet_name 参数可以接受工作表名称列表或特定索引，以将多个工作簿添加到 Data Frame 的字典中。

# Determine various sheet names
sheet_names = ['Sheet1', 'Sheet2', 'Sheet3']
# Add the predefined sheets to a word reference of Data Frames
sheets_data = pd.read_excel(excel_file_path, sheet_name=sheet_names

在此示例中，sheets_data 将是一个字典，其中键是工作表名称，值是相应的 Data Frame。

使用 Pandas 探索 Data Frame

将 Excel 文件中的数据加载到 Pandas Data Frame 后，探索和理解数据集就变得至关重要。Pandas 提供了多种函数和方法来有效地探索和操作 Data Frame。

使用 Pandas 进行数据探索

使用 head() 显示前几行

head() 函数允许您探索 Data Frame 的前几行，从而快速了解数据集的结构。

# Show the initial not many columns of the Data Frame
print(df.head())

这对于理解列名、数据类型和数据集中的基本值特别有用。

使用 describe() 获取摘要统计信息

describe() 函数为 Data Frame 中的数值列提供摘要统计信息，例如平均值、标准差、最小值、25% 分位数、中位数、75% 分位数和最大值。

# Get outline insights for mathematical segments
print(df.describe())

这提供了对数值数据中心趋势和分布的见解，有助于识别模式和潜在的异常值。

访问和操作数据

提取特定列

访问 Data Frame 中的特定列非常简单。例如，要从名为“ColumnName”的列中提取数据：

# Access a particular segment
column_data = df['ColumnName']

将“ColumnName”替换为您要提取的列的实际名称。这允许您对数据集中特定的变量执行操作。

根据条件过滤数据

Pandas 支持根据条件过滤数据，从而提取满足特定标准的子集。

# Channel information in light of a condition
filtered_data = df[df['Column'] > 10]

在此示例中，将“Column”替换为实际列名，将 10 替换为所需阈值。此方法对于隔离与您的分析相关的子集至关重要。

使用 Pandas 处理缺失数据

真实世界的数据集经常包含缺失或不完整的数据。Pandas 提供了多种有效处理缺失数据的方法，允许用户在分析之前清理和预处理数据集。

现实世界的数据挑战

理解缺失数据带来的挑战对于确保分析的准确性和可靠性至关重要。由于各种原因，可能会出现缺失数据，包括数据收集过程中的错误、数据输入或信息的简单缺失。

Pandas 处理缺失值的函数

1. dropna()：删除包含缺失值的行

dropna() 函数用于删除包含任何缺失值的行。虽然此方法会减小数据集的大小，但当对分析的影响很小时，它可能是合适的。

# Drop columns with missing qualities
df_cleaned = df.dropna()

2. fillna()：使用特定值填充缺失值

fillna() 函数允许用户用预定义的常量或计算值填充缺失值。当必须保留所有行时，此技术很有用。

# Fill missing qualities with a particular worth (e.g., 0)
df_filled = df.fillna(0)

将 0 替换为您要填充缺失条目的所需值。

3. isnull()：识别缺失值

isnull() 函数返回一个与数据形状相同的 Data Frame，其中每个条目如果对应元素是 NaN（缺失），则为 True，否则为 False。此函数对于识别缺失值的 위치 和范围至关重要。

# Make a Data Frame demonstrating missing qualities
missing_values_df = df.isnull()

理解并战略性地实施这些技术为处理数据集中的缺失数据奠定了坚实的基础。

结论

在本综合指南中，我们涵盖了使用 Pandas 将 Excel 文件导入 Python 的基础知识。从 Pandas 的安装开始，我们探讨了基本的文件读取、处理多个工作表以及高级选项，例如跳过行、选择列和处理标题。我们还深入研究了探索和操作 Data Frame、处理缺失数据以及将数据导出回 Excel 的实际方面。

有了这些知识，您就可以自信地在数据分析工作流中处理各种 Excel 文件。当您继续使用真实世界的数据集，并将 Pandas 与 Python 结合使用时，您会发现更多增强数据操纵和分析技能的策略和最佳实践。

请记住，掌握这些技能的关键在于实践。尝试使用不同的数据集，探索额外的 Pandas 功能，并不断完善您在 Python 中有效处理数据的方法。

下一主题如何在 Python 中从另一个文件导入变量

如何使用 Pandas 将 Excel 文件导入 Python？

Pandas 概述

Excel 文件处理的重要性

安装 Pandas

前提条件

安装过程

其他安装方法

使用 Pandas 读取 Excel 文件基础知识

将数据加载到 Data Frame 中

使用 Pandas 进行数据探索

访问和操作数据

Pandas 处理缺失值的函数

结论

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

其他

如何使用 Pandas 将 Excel 文件导入 Python？

Pandas 概述

Excel 文件处理的重要性

安装 Pandas

前提条件

安装过程

其他安装方法

使用 Pandas 读取 Excel 文件基础知识

将数据加载到 Data Frame 中

使用 Pandas 进行数据探索

访问和操作数据

Pandas 处理缺失值的函数

结论

相关帖子

2024年开发者必备的10个Python CLI库

Python中HashMap和Dictionary的区别

在Python中获取列表的第一个和最后一个元素

Python中的感知器学习算法

查找数组中满足xy > yx的数对(x, y)

理解Python 3中的布尔逻辑

使用Python实现不相交集（Union-Find算法）入门

使用Python进行RFM分析

面向对象Python - 对象序列化

Python中的区间树

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器