如何在 Python 列表或可迭代对象中获取第一个匹配项

2025年3月17日 | 阅读11分钟

在你的 Python 之旅中，你可能需要在 Python 可迭代对象（如列表或字典）中的某个不确定的点找到符合特定要求的第一个项目。

唯一的例外是当需要确认迭代器中存在“某个”特定项目时。例如，你可能需要在一个名称列表中查找一个名称，或者在一个字符串中查找一个子字符串。在这些情况下，你最好使用 in 运算符。但是，在许多情况下，你可能需要查找具有特定特征的装备。例如，你可能需要

在一个数字列表中查找一个非零值。
在一个字符串列表中查找具有特定大小的名称。
根据特定特征，在字典列表中查找并修改字典。

本教程将介绍每种情况的最佳方法。一种选择是将整个可迭代对象更改为新列表，然后使用它。使用 index() 获取符合你条件的主要项目。

>>> names = ["Linda", "Tiffany", "Florina", "Jovann"]
>>> length_of_names = [len(name) for name in names]
>>> IDX = length_of_names.index(7)
>>> names[idx]

输出

'Tiffany'

已经使用了。在这种情况下，使用 Index() 发现“Tiffany”是你列表中第一个有七个字符的名称。这种技术并不理想，部分原因是即使主要项目匹配，你也会计算每个组件的标准。

你正在迭代的设备在约束范围内寻找计算出的特征。本课程将向你展示如何匹配此类派生属性而无需进行不必要的计算。

了解如何在 Python 记录中获取第一个匹配项

你可能已经熟悉 Python 的运算符，它可以让你知道产品是否是可迭代对象的一部分。

尽管这是最有效的方法，但你有时可能需要根据对象的计算属性（例如它们的长度）来匹配对象。

例如，处理字典列表可能就是这种情况，这是处理 JSON 数据时的典型结果。尝试 country-Json 中的以下信息

>>> nations = [
...     "country": "Austria," "population": 8_840_521,
...     "country": "Canada," "population": 37_057_765,
...     "country": "Cuba," "population": 11_338_138,
...     "country": "Dominican Republic," "population": 10_627_165,
...     "country": "Germany," "population": 82_905_782,
...     "country": "Norway," "population": 5_311_916,
...     "country": "Philippines," "population": 106_651_922,
...     "country": "Poland," "population": 37_974_750,
...     "country": "Scotland," "population": 5_424_800,
...     "country": "United States," "population": 326_687_501,
... ]

你可能需要控制主要的字典，其中包含超过 1 亿人。in 运算符是一个糟糕的选择，原因有二。要匹配它，你需要整个字典，并且它不会返回相同的对象而不是布尔值。

>>> target_country = "nation": "Philippines," "inhabitants": 106_651_922
>>> target_country in nations

输出

True

如果你想根据字典的一个方面（如人口）查找字典，则无法使用它。

使用简单的 for 循环可能是最容易理解的方法，用于根据计算值在列表中定位和操作主要组件。

>>> for the nation in nations:
...     if nation["population"] > 100_000_000:
...         print(nation)
...         break

输出

"nation": "Philippines," "inhabitants": 106651922

你可以在 for 循环体内对目标对象执行任何操作，而不是打印它。请务必在完成后结束循环，这样你就不必再次搜索列表的其余部分。

第一个包（你可以从 PyPI 获取）很简单，并且最初采用 for 循环方法。它提供了一个通用函数 ()。此函数通常从可迭代对象返回初始真值。在通过可选的 key 参数传递 key 参数后，返回第一个真值。

Python Mills 习惯于赢得首轮比赛

Python 解释器用于定位列表或任何其他可迭代对象中的主要元素，迭代器是内存高效的可迭代对象。它们是 Python 的基本特性，并在内部大量使用。你很可能在不知不觉中使用了生成器！

生成器可能存在一些缺点，例如它们可能更简洁，因此不如循环可读。生成器确实有一些效率优势，但考虑到可读性的重要性，这些优势有时可能微不足道。然而，使用它们可能会很有趣并提升你的 Python 游戏水平！

在 Python 中创建生成器有几种方法，但本教程将使用生成器表达式。

>>> gen = (nation for a nation in nations)
>>> subsequent(gen)
'nation': 'Austria,' 'inhabitants': 8840521

>>> subsequent(gen)

输出

'nation': 'Canada,' 'inhabitants': 37057765

定义生成器迭代器后，你可以对生成器调用 next() 函数，一次生成一个国家，直到国家列表完成。

你可以修改生成器表达式以包含条件语句，以确保后续迭代器仅返回符合你条件的项，以查找列表中与特定条件集匹配的主要组件。在下面的示例中，你使用条件表达式根据其人口特征是否大于 1 亿来生成项目。

>>> gen = (
...     nation for the nation in nations
...     if nation["population"] > 100_000_000
... )
>>> subsequent(gen)

输出

'nation': 'Philippines,' 'inhabitants': 106651922

因此，字典生成器现在只会创建人口属性大于 1 亿的字典。这意味着，就像 for 循环方法一样，生成器迭代器的第一次 next() 调用将返回你在列表中寻找的第一个项目。

在可读性方面，生成器并不像 for 循环那样纯粹。那么，你为什么需要它呢？下一节将进行快速性能比较。

考虑 Mills 和 Loops 的效率

一如既往，在评估功效时，你不应过分相信任何一组结果。相反，在做出任何关键决定之前，使用你自己的实际经验为你的代码创建一个测试。此外，你应该考虑可读性与复杂性；有时，节省几毫秒并不值得。

你必须为此测试开发一个函数，该函数可以生成任意大小的列表，并在特定位置具有特定值。

>>> from pprint import pp
>>> def build_list(dimension, fill, worth, at_position):
...     return [value if i == at_position else fill for i in range(size)]
...

>>> pp(
...     build_list(
...         dimension=10,
...         fill="nation": "Nowhere," "inhabitants": 10,
...         worth="nation": "Atlantis," "inhabitants": 100,
...         at_position=5,
...     )
... )

输出

['country': 'Nowhere,' 'population': 10,
 'country': 'Nowhere,' 'population': 10,
 'country': 'Nowhere,' 'population': 10,
 'country': 'Nowhere,' 'population': 10,
 'country': 'Nowhere,' 'population': 10,
 'country': 'Atlantis,' 'population': 100,
 'country': 'Nowhere,' 'population': 10,
 'country': 'Nowhere,' 'population': 10,
 'country': 'Nowhere,' 'population': 10,
 'country': 'Nowhere,' 'population': 10]

build_list() 函数创建一个填充有彼此相似的项目的列表。除了一个项目，列表中的每个项目都是填充参数的副本。值参数（唯一的异常）位于由 at-position 参数给出的索引处。

为了使创建的列表更具可读性，你导入并使用了 print 来输出它。否则，列表将始终适合一行。借助此功能，你可以生成大量包含位于不同位置的目标值的列表。这应该用于衡量在列表开头和结尾查找部分所需的时间。

为了识别人口特征超过 50 的字典，你需要两个额外的基本特征，这些特征是硬编码的，用于查找循环和生成器。

def find_match_loop(iterable):
    for worth in iterable:
        if worth["population"] > 50:
            return worth
    return None

def find_match_gen(iterable):
    return subsequent(
      (worth for worth in iterable if worth["population"] > 50),
      None
    )

这些功能是硬编码的，以保持检查的简单性。在下一节中，你将创建一个可重用函数。

有了这些基本组件，你就可以设置一个带时间的脚本，以检查具有多个列表的匹配特征，这些列表包含目标位置和列表中的多个位置。

from timeit import timeit
TIMEIT_TIMES = 100
LIST_SIZE = 500
POSITION_INCREMENT = 10

def build_list(dimension, fill, worth, at_position): ...

def find_match_loop(iterable): ...

def find_match_gen(iterable): ...

looping_times = []
generator_times = []
positions = []

for place in vary(0, LIST_SIZE, POSITION_INCREMENT):
    print(
        f"Progress place / LIST_SIZE:.0%",
        finish=f"3 * ' 'r",  # Clear earlier characters and reset cursor
    )

    positions.append(place)

    list_to_search = build_list(
        LIST_SIZE,
        "nation": "Nowhere," "inhabitants": 10,
        "nation": "Atlantis," "inhabitants": 100,
        place,
    )

    looping_times.append(
        time it(
            "find_match_loop(list_to_search)",
            globals=globals(),
            quantity=TIMEIT_TIMES,
        )
    )
    generator_times.append(
        time it(
            "find_match_gen(list_to_search)",
            globals=globals(),
            quantity=TIMEIT_TIMES,
        )
    )

print("Progress 100%")

此脚本将生成使用生成器或循环在两个并发列表中查找每个组件所需的时间。该脚本还可以生成第三个列表，其中目标组件位于其正确位置。

即使理想情况下你应该规划你的策略，你也没有根据结果采取行动。尝试以下成功的脚本，它使用 matplotlib 从输出中提供一些图表。

# chart.py
from timeit import timeit
import matplotlib.pyplot as plt
TIMEIT_TIMES = 1000  # Enhance quantity for smoother strains
LIST_SIZE = 500
POSITION_INCREMENT = 10

def build_list(dimension, fill, worth, at_position):
    return [value if i == at_position else fill for i in range(size)]

def find_match_loop(iterable):
    for worth in iterable:
        if worth["population"] > 50:
            return worth

def find_match_gen(iterable):
    return subsequent(worth for worth in iterable if worth["population"] > 50)

looping_times = []
generator_times = []
positions = []

for place in vary(0, LIST_SIZE, POSITION_INCREMENT):
    print(
        f"Progress place / LIST_SIZE:.0%",
        finish=f"3 * ' 'r",  # Clear earlier characters and reset cursor
    )

    positions.append(place)

    list_to_search = build_list(
        dimension=LIST_SIZE,
        fill="nation": "Nowhere," "inhabitants": 10,
        worth="nation": "Atlantis," "inhabitants": 100,
        at_position=place,
    )

    looping_times.append(
        time it(
            "find_match_loop(list_to_search)",
            globals=globals(),
            quantity=TIMEIT_TIMES,
        )
    )
    generator_times.append(
        time it(
            "find_match_gen(list_to_search)",
            globals=globals(),
            quantity=TIMEIT_TIMES,
        )
    )

print("Progress 100%")

fig, ax = plt.subplots()

plot = ax.plot(positions, looping_times, label="loop")
plot = ax.plot(positions, generator_times, label="generator")

plt.xlim([0, LIST_SIZE])
plt.ylim([0, max(max(looping_times), max(generator_times))])

plt.xlabel("Index of component to be discovered")
plt.ylabel(f"Time in seconds to search out component TIMEIT_TIMES:, instances")
plt.title("Uncooked Time to Discover First Match")
plt.legend()
plt.present()
# Ratio
looping_ratio = [loop/loop for loop in looping_times]
generator_ratio = [
    gen/loop for gen, loop in zip(generator_times, looping_times)
]

fig, ax = plt.subplots()

plot = ax.plot(positions, looping_ratio, label="loop")
plot = ax.plot(positions, generator_ratio, label="generator")

plt.xlim([0, LIST_SIZE])
plt.ylim([0, max(max(looping_ratio), max(generator_ratio))])

plt.xlabel("Index of component to be discovered")
plt.ylabel("Pace to search out component, relative to loop")
plt.title("Relative Pace to Discover First Match")
plt.legend()
plt.present()

运行脚本可能需要一些时间，具体取决于你使用的系统以及用于 TIMEIT_TIMES、LIST_SIZE 和 POSITION_INCREMENT 的数字，但它应该会生成一个图表，显示彼此对应的时间。

How to get the First Match from a Python List or Iterable

此外，在关闭主图表后，你将看到另一个图表，显示两种方法的相对有效性。

这张最终图表清楚地表明，在此测试中，当所需项目接近迭代器的开始时，生成器比 for 循环慢得多。然而，一旦要查找的组件位于位置 100 或更高，生成器始终如一地以显著优势超越 for 循环。

前一个图表上的放大镜图标允许你交互式地放大。图表放大以显示大约 5% 或 6% 的效率提升。虽然 5% 不算特别显著，但也并非微不足道。这是否值得你投入，将取决于你将使用的具体信息以及使用频率。根据这些结果，你可以大胆猜测生成器比循环更快，即使当要查找的对象在前一百次迭代中时，生成器可能会显著变慢。在使用短列表时，总的原始毫秒损失差异可以忽略不计。但对于大型迭代，当 5% 的增益可能需要数分钟时，这一点很重要。

这张最终图表显示，对于非常大的可迭代对象，效率提升稳定在大约 6%。也不必担心尖峰；为了测试这个大的可迭代对象，TIMEIT_TIMES 被显著降低了。

创建可重用的 Python 函数以查找第一个匹配项

假设你正在考虑完全优化你的代码，因为你预计使用的可迭代对象会很大。为此，你将使用生成器代替 for 循环。你还将使用各种完全不同的可迭代对象，使用各种工具，因此你需要灵活地组合，因此你将构建你的工作以实现各种目标。

返回基本的真实值。
返回初始匹配项。
返回通过键操作传递的值的初始真实结果。
返回通过键操作传递的值的第一个匹配项。
在没有匹配项的情况下提供默认值。

尽管有多种方法可以实现此目的，但这里提供了一种使用模式匹配的方法。

def get_first(iterable, worth=None, key=None, default=None):
    match worth is None, callable(key):
        Case (True, True):
            Gen = (elem for elem in iterable if key(elem))
        Case (False, True):
            gen = (elem for elem in iterable if key(Elem) == worth)
        Case (True, False):
            Gen = (elem for elem in iterable if item)
        Case (False, False):
            gen = (elem for elem in iterable if elem == worth)
    return subsequent(gen, default)

该函数最多可以有四个参数；根据你提供的参数组合，它会有不同的行为。

值和主要理由构成了工作习惯的基础。因此，匹配断言检查值是否为 None，并使用 callable() 函数检查键是否为函数。

例如，如果所有条件都为真，你提交了一个键但没有值。这意味着返回值必须是第一个准确结果，并且可迭代对象中的每个项目都需要通过键函数。

另一个例子：如果所有匹配要求都为 False，你提供了一个值但没有键。传递一个没有键的值表明你真正感兴趣的是与你传递的值匹配的可迭代对象的主要组件。

游戏结束后，你将收到你的生成器。生成器和第一个匹配项的默认参数都必须在函数 next() 的名称中才能使用。你可以使用此函数通过以下四种额外方式查找匹配项。

>>> nations = [
...     "country": "Austria," "population": 8_840_521,
...     "country": "Canada," "population": 37_057_765,
...     "country": "Cuba," "population": 11_338_138,
...     "country": "Dominican Republic," "population": 10_627_165,
...     "country": "Germany," "population": 82_905_782,
...     "country": "Norway," "population": 5_311_916,
...     "country": "Philippines," "population": 106_651_922,
...     "country": "Poland," "population": 37_974_750,
...     "country": "Scotland," "population": 5_424_800,
...     "country": "United States," "population": 326_687_501,
... ]
>>> get_first(nations)

输出

'nation': 'Austria,' 'inhabitants': 8840521

输出

'nation': 'Germany,' 'inhabitants': 82905782

>>> get_first(
...     nations, worth=5_311_916, key=lambda nation: nation["population"]
... )

输出

'nation': 'Norway,' 'inhabitants': 5311916

>>> get_first(
...     nations, key=lambda nation: nation["population"] > 100_000_000
... )

输出

'nation': 'Philippines,' 'inhabitants': 106651922

你可以通过多种方式与此函数配合。例如，你可以处理值、重要特性或两者兼而有之！

前面提到的第一个包中的操作签名只有些许不同。没有 worth 参数。通过关注关键变量，你仍然可以获得与上面相同的结果。

>>> from first import, first
>>> first(
...     nations,
...     key=lambda merchandise: merchandise == "nation": "Cuba," "inhabitants": 11_338_138
... )

输出

'nation': 'Cuba,' 'inhabitants': 11338138

你甚至可以在下载的材料中找到 get_first() 的第二个实现，它与第一个包的签名匹配。

无论你选择哪种解决方案，你现在都拥有了一个强大、可重用的函数，可以获取你所需的关键项目。

下一主题如何使用 Flask 处理 URL 中缺失的参数

如何在 Python 列表或可迭代对象中获取第一个匹配项

了解如何在 Python 记录中获取第一个匹配项

Python Mills 习惯于赢得首轮比赛

考虑 Mills 和 Loops 的效率

创建可重用的 Python 函数以查找第一个匹配项

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Python 问题

如何在 Python 列表或可迭代对象中获取第一个匹配项

了解如何在 Python 记录中获取第一个匹配项

Python Mills 习惯于赢得首轮比赛

考虑 Mills 和 Loops 的效率

创建可重用的 Python 函数以查找第一个匹配项

相关帖子

如何在 Python 中对元组进行排序

在 Python 中使用 NumPy 的 X 点评估 Legendre Series 在多维数组上

如何在 Python 中比较两个列表

使用平凡哈希函数进行排序

Python 中的情感分析

Python 中的 Enum 类

使用 Python 进行道德黑客

Python 中的猜单词游戏

Python 少儿编程：Python 学习路径资源

Deepchecks 测试机器学习模型 | Python

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器