如何使用 Python 遍历目录中的文件？

2025 年 1 月 5 日 | 12 分钟阅读

引言

Python 中文件遍历的目的

Python 中的文件遍历是一项关键的迭代操作，它允许软件工程师探索和处理目录中的文件。它为有效地管理和控制存储在文件中的数据提供了必要的资源，是各种应用程序的重要组成部分。文件遍历的主要作用包括：

数据处理：文件遍历对于处理和分析存储在文件中的数据至关重要，使软件工程师能够读取、修改或提取信息以进行进一步的分析或转换。
自动化：Python 的文件遍历功能在自动化与文件管理相关的繁琐任务中发挥着至关重要的作用。这对于重命名文件、组织目录或精确更新文件内容等任务尤其重要。
脚本和开发：文件遍历是脚本编写和软件开发中不可或缺的一部分，它允许开发人员实时处理文件，并使他们的程序能够适应不断变化的文件结构。

迭代的重要性

高效处理大型数据集

在涉及大型数据集的情况下，手动处理单个文件变得不切实际。文件遍历提供了一种系统且可扩展的方法来遍历目录，从而能够高效地处理和管理大量数据集，而无需手动干预。

自动化重复性任务

自动化是 Python 中文件遍历的一个关键优势。通过遍历文件，开发人员可以自动化繁琐且日常的任务，从而降低出错的可能性并节省宝贵的时间。常见的自动化任务包括批量重命名、文件格式转换或从文件中提取特定信息。

简化数据处理工作流程

文件遍历有助于数据处理工作流程的顺畅进行。无论它是数据分析管道还是自动化脚本的一部分，遍历文件的能力都能确保数据得到一致且准确的处理。这种简化对于维护数据处理任务的完整性和可靠性至关重要。

文件遍历基础

使用 os 模块

os 模块简介

Python 的 os 模块提供了强大的功能集，用于与操作系统进行交互，包括文件和目录操作。了解 os 模块的基础知识对于有效的文件遍历至关重要。

指定目录路径

import os
directory_path = '/way/to/your/catalog'

说明：导入 os 模块，并将 directory_path 变量设置为目标目录的路径。

os.listdir() 函数用于列出文件

os.listdir() 函数允许我们获取指定目录中各项内容的列表。这包括文件和子目录。此方法是遍历文件的基础。

使用 os.listdir() 获取目录中的项目列表

说明：使用 os.listdir(directory_path) 获取预定义目录中所有项目（文件和目录）的列表。

使用 os.path.join() 构建完整的文件路径

构建完整的文件路径对于有效处理文件至关重要。os.path.join() 函数通过无缝地连接目录名和文件名来创建跨平台的路径。

遍历每个项目并处理文件

for thing in items_in_directory:
    item_path = os.path.join(directory_path, thing)
    # Check in the event that the thing is a file utilizing os.path.isfile()
    if os.path.isfile(item_path):
        # Process the file (e.g., print the document way)
        print("File:", item_path)

说明：遍历目录中的每个项目。对于每个项目，使用 os.path.join() 构建完整路径 (item_path)。使用 os.path.isfile(item_path) 检查该项目是否为文件。如果是文件，则执行所需的操作（例如，打印文件路径）。

使用 os.path.isfile() 过滤文件

并非 os.listdir() 返回的所有项目都是文件；它们也可能是目录。os.path.isfile() 函数使我们能够仅从列表中过滤文件，从而确保更具针对性的遍历。

import os
# Determine the registry way
directory_path = '/way/to/your/index'
# Use os.listdir() to get a rundown of all things in the catalog
items_in_directory = os.listdir(directory_path)
# Repeat over every thing and channel just documents utilizing os.path.isfile()
for thing in items_in_directory:
    item_path = os.path.join(directory_path, thing)
    # Check in the event that the thing is a file utilizing os.path.isfile()
    in the event that os.path.isfile(item_path):
        # Process the document (e.g., print the file way)
        print("File:", item_path)

说明

指定目录路径

将 directory_path 变量设置为目标目录的路径。

使用 os.listdir()

使用 os.listdir(directory_path) 获取预定义目录中所有项目（文件和目录）的列表。

遍历每个项目并使用 os.path.isfile() 过滤文件

对于目录中的每个项目，使用 os.path.join() 构建完整路径 (item_path)。

使用 os.path.isfile(item_path) 检查该项目是否为文件。

如果是文件，则执行所需的操作（例如，打印文件路径）。

高级文件遍历技术

使用 os.scandir()

os.scandir() 简介，以提高性能

虽然 os.listdir() 提供文件名的列表，但 os.scandir() 提供 DirEntry 对象，这些对象封装了文件信息，从而提供了更高效的方法。本节深入探讨了在使用 os.scandir() 进行文件遍历时提高性能的优点。

使用 DirEntry 对象获取其他文件信息

从 os.scandir() 检索到的 DirEntry 对象提供有关每个文件的其他信息，例如文件大小、修改时间和是否为目录。利用这些信息可以增强文件遍历的功能。

通过迭代器提高内存效率

os.scandir() 返回一个迭代器，从而实现更节省内存的文件遍历，尤其是在处理大型目录时。本节探讨了如何利用迭代器功能来提高性能。

import os
directory_path = '/way/to/your/index'
# Utilizing os.scandir() for improved execution
with os.scandir(directory_path) as passages:
    for passage in sections:
        # Using DirEntry objects for extra file data
        in the event that entry.is_file():
            # Process the document utilizing entry.path
            print(entry.path)

说明

导入 os 模块：导入 os 模块，它提供了特定于操作系统的功能。
指定目录路径：将 directory_path 变量设置为目标目录的路径。
使用 os.scandir()：使用 os.scandir(directory_path) 获取 DirEntry 对象的迭代器，这些对象表示指定目录中的项。
遍历 DirEntry 对象：使用 with 语句来确保正确的资源管理。使用 for 循环遍历 DirEntry 对象。
检查条目是否为文件：使用 entry.is_file() 检查当前条目是否代表一个文件。
处理文件：如果条目是文件，则执行所需的操作（例如，使用 entry.path 打印文件路径）。

使用 pathlib 模块

pathlib 简介，用于面向对象的路径操作

pathlib 模块为处理文件路径引入了一种面向对象的方法。了解 pathlib 的基础知识对于编写简洁且富有表现力的文件遍历代码至关重要。

使用 Path.iterdir() 遍历文件

Path.iterdir() 直接在 Path 对象上提供一个迭代器，从而简化了文件遍历过程。本节演示了如何使用 pathlib 进行简洁易懂的文件遍历。

将 pathlib 与 os 函数结合使用

探索如何将 pathlib 的优点与传统的 os 函数结合起来，以实现灵活而强大的文件遍历方法。这种组合在面向对象的設計和实用性之间取得了平衡。

from pathlib import Way
directory_path = Way('/way/to/your/index')
# Emphasizing over files utilizing pathlib.Path.iterdir()
for file_path in directory_path.iterdir():
    if file_path.is_file():
        # Process the document depending on the situation
        print(file_path)

说明

从 pathlib 导入 Path：从 pathlib 模块导入 Path 类。
将目录路径指定为 Path 对象：创建一个表示目标目录的 Path 对象。
使用 Path.iterdir()：使用 directory_path.iterdir() 获取 Path 对象的迭代器，这些对象表示目录中的项。
遍历 Path 对象：使用 for 循环遍历 Path 对象。
检查项目是否为文件：使用 file_path.is_file() 检查当前的 Path 对象是否代表一个文件。
处理文件：如果是文件，则执行所需的操作（例如，使用 file_path 打印文件路径）。

递归文件遍历

使用 os.walk() 进行递归目录遍历

在处理嵌套目录结构时，os.walk() 变得至关重要。本节阐述了如何使用 os.walk() 进行递归文件遍历，涵盖了自顶向下和自底向上两种遍历方法。

处理子目录和嵌套结构

了解在递归文件遍历中有效处理子目录和嵌套结构的策略。这包括根据特定目录路径过滤文件的策略以及避免无限循环。

平衡深度优先和广度优先方法

探索使用 os.walk() 时深度优先和广度优先遍历之间的区别。了解每种方法的含义，并选择与文件遍历任务的需求一致的方法。

import os
directory_path = '/way/to/your/registry'
# Here, we are recursively file emphasis utilizing os.walk()
for envelope, _, documents in os.walk(directory_path):
    for file_name in files:
        # Develop the full document way
        file_path = os.path.join(folder, file_name)
        # Process the file on a case by case basis
        print(file_path)

说明

导入 os 模块：导入 os 模块以获取特定于操作系统的功能。
指定目录路径：将 directory_path 变量设置为目标目录的路径。
使用 os.walk()：使用 os.walk(directory_path) 对目录进行递归遍历，生成 (current_directory, subdirectories, files) 的元组。
遍历文件：使用嵌套的 for 循环遍历从 os.walk() 获取的文件。
构建完整的文件路径：使用 os.path.join() 使用当前目录和文件名构建完整的文件路径。
处理文件：执行所需的操作（例如，打印文件路径）。

文件过滤和模式匹配

使用 glob 模块

glob 模块简介，用于模式匹配

glob 模块对于在目录中查找文件时的模式匹配至关重要。它允许使用通配符来指定模式，使其成为灵活文件选择的有用工具。

使用通配符进行灵活的文件选择

演示如何使用通配符（例如，* 用于匹配任何字符）与 glob 模块根据特定模式过滤文件。这种灵活性对于文件名遵循特定约定的情况非常重要。

将 glob 与其他技术结合用于复杂场景

强调 glob 如何与其他前面讨论的文件遍历技术（例如 os.listdir() 或 os.scandir()）结合使用。这种组合为处理复杂的文件遍历需求提供了一种全面的方法。

列表推导式用于过滤

利用列表推导式进行简洁的代码编写

介绍列表推导式的概念，这是一种简洁易懂的在遍历过程中过滤文件的方法。演示如何使用它根据指定的模式创建过滤列表。

根据特定标准（例如，文件扩展名）过滤文件

提供列表推导式的示例，以根据特定规则（例如，文件扩展名）过滤文件。当处理大型数据集并需要缩小选择范围时，此策略尤其有用。

将列表推导式与其他方法结合使用

探索列表推导式如何与其他文件遍历技术无缝集成，以创建高效且可读的代码。这种组合允许以简化的方式处理复杂的文件过滤场景。

import glob
directory_path = '/way/to/your/registry'
# Utilizing glob to coordinate files with a particular expansion
file_pattern = '*.txt'
search_path = os.path.join(directory_path, file_pattern)
# Emphasize over each matching file
for file_path in glob.glob(search_path):
    # Process the document depending on the situation
    print(file_path)

说明

导入 glob 模块：导入 glob 模块以进行模式匹配。
指定目录路径：将 directory_path 变量设置为目标目录的路径。
定义文件模式：使用通配符指定文件模式（例如，'*.txt'）以匹配具有特定扩展名的文件。
构建搜索路径：使用 os.path.join() 通过结合目录路径和文件模式来创建完整搜索路径。
遍历匹配的文件：使用 glob.glob(search_path) 获取匹配模式的文件列表，然后遍历它们。
处理文件：在循环中，根据需要处理每个文件（例如，打印文件路径）。

import os
directory_path = '/way/to/your/catalog'
# List perception to channel files with a particular expansion
file_extension = '.txt'
filtered_files = [file for file in os.listdir(directory_path) if file.endswith(file_extension)]
# Repeat over the separated files
for file_name in filtered_files:
    # Develop the full document way
    file_path = os.path.join(directory_path, file_name)
    # Process the file on a case by case basis
    print(file_path)

说明

指定目录路径：将 directory_path 变量设置为目标目录的路径。
定义文件扩展名：指定要过滤的所需文件扩展名（例如，'.txt'）。
列表推导式：使用列表推导式根据预定义的文件扩展名过滤目录中的文件。
遍历过滤后的文件：遍历从列表推导式获得的过滤后的文件。
构建文件路径：使用 os.path.join() 为每个过滤后的文件构建完整的文件路径。
处理文件：在循环中，根据需要处理每个过滤后的文件（例如，打印文件路径）。

文件操作和处理

在本节中，我们将探讨 Python 中的各种文件操作和处理技术。我们将介绍读取和写入文件、处理不同的文件格式以及执行常见的文件操作。

读写文本文件

使用 open() 读取文本文件

file_path = '/way/to/your/text_file.txt'
try:
    with open(file_path, 'r') as document:
        content = file.read()
        print(content)
except FileNotFoundError:
    print(f"File not found: {file_path}")
except Exception as e:
    print(f"An mistake happened: {e}")

说明

使用 open(file_path, 'r') 以读取模式打开文本文件。

使用 with 语句以确保正确的文件处理。

使用 file.read() 读取文件的全部内容。

打印内容或处理 FileNotFoundError 异常。

使用 open() 写入文本文件

file_path = '/way/to/your/new_text_file.txt'
try:
    with open(file_path, 'w') as file:
        file.write("Hello, this is another text file!\n")
        file.write("Additional line.")
except Exception as e:
    print(f"An blunder happened: {e}")

说明

使用 open(file_path, 'w') 以写入模式打开文本文件。

使用 with 语句进行正确的文件处理。

使用 file.write() 将文本写入文件。

读写 JSON 文件

读取 JSON 文件

import json
json_file_path = '/way/to/your/data.json'
try:
    with open(json_file_path, 'r') as json_file:
        information = json.load(json_file)
        print(data)
except FileNotFoundError:
    print(f"File not found: {json_file_path}")
except json.JSONDecodeError as e:
    print(f"Error translating JSON: {e}")
except Exception as e:
    print(f"An mistake happened: {e}")

说明

使用 open(json_file_path, 'r') 以读取模式打开 JSON 文件。

使用 json.load(json_file) 从文件中加载 JSON 数据。

写入 JSON 文件

import json
json_file_path = '/way/to/your/new_data.json'
data_to_write = {"key": "esteem", "number": 42}
try:
    with open(json_file_path, 'w') as json_file:
        json.dump(data_to_write, json_file, indent=2)
except Exception as e:
    print(f"An blunder happened: {e}")

说明

使用 open(json_file_path, 'w') 以写入模式打开 JSON 文件。

使用 json.dump(data, json_file, indent=2) 将 JSON 数据写入文件并进行缩进。

常见文件操作

检查文件是否存在

file_path = '/way/to/your/file.txt'
if os.path.exists(file_path):
    print(f"The document {file_path} exists.")
else:
    print(f"The document {file_path} doesn't exist.")

说明

使用 os.path.exists(file_path) 检查文件是否存在。

复制文件

import shutil
source_file = '/way/to/your/source_file.txt'
destination_file = '/way/to/your/destination_file.txt'
try:
    shutil.copy(source_file, destination_file)
    print(f"File replicated from {source_file} to {destination_file}.")
except FileNotFoundError:
    print(f"Source file not found: {source_file}")
except Exception as e:
    print(f"An mistake happened: {e}")

说明

使用 shutil.copy(source, destination) 复制文件。

高级文件处理和操作

在本节中，我们将探讨高级文件处理和操作技术，包括处理二进制文件、使用 CSV 和 Excel 文件以及执行批量文件操作。

处理二进制文件

读取二进制文件

binary_file_path = '/way/to/your/binary_file.bin'
try:
    with open(binary_file_path, 'rb') as binary_file:
        binary_data = binary_file.read()
        print(binary_data)
excpt FileNotFoundError:
    print(f"File not found: {binary_file_path}")
except Exception as e:
    print(f"An blunder happened: {e}")

说明

使用 open(binary_file_path, 'rb') 以读取模式打开二进制文件。

使用 binary_file.read() 读取二进制文件的全部内容。

写入二进制文件

binary_file_path = '/way/to/your/new_binary_file.bin'
binary_data_to_write = b'\x48\x65\x6C\x6C\x6F' # Model double information
try:
    with open(binary_file_path, 'wb') as binary_file:
        binary_file.write(binary_data_to_write)
except Exception as e:
    print(f"An mistake happened: {e}")

说明

使用 open(binary_file_path, 'wb') 以写入模式打开二进制文件。

使用 binary_file.write(binary_data) 将二进制数据写入文件。

处理 CSV 文件

使用 csv 模块读取 CSV 文件

import csv
csv_file_path = '/way/to/your/data.csv'
try:
    with open(csv_file_path, 'r', newline='') as csv_file:
        csv_reader = csv.reader(csv_file)
        for column in csv_reader:
            print(row)
except FileNotFoundError:
    print(f"File not found: {csv_file_path}")
except Exception as e:
    print(f"An mistake happened: {e}")

说明

使用 open(csv_file_path, 'r', newline='') 以读取模式打开 CSV 文件。

使用 csv.reader(csv_file) 逐行读取 CSV 文件。

结论

Python 的文件处理功能为开发人员提供了强大的工具集，可用于各种应用程序。从使用 os 和 pathlib 等模块进行基本目录遍历到涉及各种文件格式的复杂操作，Python 都提供了灵活性。本指南涵盖了遍历文件的基本技术、文本和二进制文件操作的高级技术，以及使用 csv 和 pandas 等库处理 CSV 和 Excel 文件中的结构化数据。强调最佳实践，结合错误处理机制（例如 try-except 块）和采用 os.path 功能，有助于提高文件处理代码的健壮性。使用 with 语句进行资源管理可确保正确的资源处理。还探讨了检查文件权限和执行原子文件写入等实际注意事项。掌握这些概念可以使 Python 开发人员能够自信地应对实际挑战，使他们的文件处理代码在各种场景中都可靠、高效且可扩展。总的来说，本指南为寻求全面了解 Python 文件处理功能和最佳实践的开发人员提供了全面的资源。

下一个主题如何记录 Python 异常

← 上一个下一个 →

如何使用 Python 遍历目录中的文件？

引言

Python 中文件遍历的目的

迭代的重要性

文件遍历基础

使用 os 模块

高级文件遍历技术

使用 pathlib 模块

递归文件遍历

文件过滤和模式匹配

列表推导式用于过滤

文件操作和处理

高级文件处理和操作

结论

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

其他

如何使用 Python 遍历目录中的文件？

引言

Python 中文件遍历的目的

迭代的重要性

文件遍历基础

使用 os 模块

高级文件遍历技术

使用 pathlib 模块

递归文件遍历

文件过滤和模式匹配

列表推导式用于过滤

文件操作和处理

高级文件处理和操作

结论

相关帖子

Python中的sys.stdout.write

如何在Python 3中将字节读取为流

Python中json.load()和json.loads()的区别

Response.headers - Python Requests

Python中检测有向图中的循环

Python中的os.chmod()方法

Python Scikit Learn - Ridge回归

如何在Windows上安装Python的cx_oracle

Python Do While循环

Python中的Matplotlib.pyplot.clf()

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器