Python 程序查找集合列表中重复的集合

2025年03月17日 | 阅读 9 分钟

在本文中，我们将研究各种 Python 程序，使我们能够快速找出集合列表中重复的集合。为了完成这项任务，我们将利用 Python 强大的集合操作和函数式编程特性。我们还将根据具体需求介绍处理重复项的几种技术和方法。

在本文中，我们将探索比较集合相等性、根据元素识别重复集合以及从集合列表中删除或修改重复集合的有效方法。

问题介绍:

给定的集合列表可能包含具有重复元素的集合，任务是根据其内容有效地识别这些重复集合并采取适当的操作，例如删除或修改它们。

# If the input is:
Set_List = [
    {1, 2, 3},
    {4, 7, 2, 9},
    {5, 7, 6},
    {3, 1, 2},
    {6, 7, 5, 4},
    {6, 7, 5}
]

# The output should be:
Set_List = [
    {1, 2, 3},
    {4, 7, 2, 9},
    {5, 7, 6},
    {6, 7, 5, 4},
]

"""Explanation - In the above-given list of set, ('{1, 2, 3}' and '{3, 1, 2}') and ('{5, 7, 6}' and '{6, 7, 5}') are equal sets so we must remove them from the list."""

该问题可分为以下几个主要步骤

接收一个集合列表作为输入，其中每个集合可能包含唯一或冗余的条目。
考虑到集合中元素的排列，将列表中的集合进行比较，以根据其元素找出重复的集合。
制定一种处理重复集合的策略或算法，例如根据具体需求删除或修改它们。
给出完整的集合列表，其中重复的集合已根据指定的处理方法得到妥善处理。

Python 程序：使用 frozenset() 和集合推导式从集合列表中删除重复的集合

我们可以使用 set() 函数，它会从列表中删除重复项，来删除 Set_List 中的重复集合。但是，由于集合本身不可哈希，我们需要将 Set_List 中的每个集合转换为 frozenset 对象，因为 frozenset 是可哈希的。

# Define a list of sets to check for duplicates
Set_List = [
    {1, 2, 3, 1},
    {4, 7, 2, 9},
    {5, 7, 6},
    {3, 1, 2},
    {6, 7, 5, 4},
    {6, 7, 5, 5}
]

# Create a list of unique sets
# First, convert each set in Set_List to a frozenset to make it hashable and immutable.
# Then, pass this iterable of frozensets to the set() function to remove any duplicate.
# Finally, convert the resulting set back to a list to make it easier to work with.
unique_sets = list(set(frozenset(s) for s in Set_List))

# Print out the list of unique sets
print(f"Unique Sets: {unique_sets}")

输出

Unique Sets: [frozenset({1, 2, 3}), frozenset({5, 6, 7}), frozenset({9, 2, 4, 7}), frozenset({4, 5, 6, 7})]

现在让我们开始，探索用于在集合列表中查找重复集合的 Python 程序。

方法 1 - 使用 for 循环和 frozenset()

在此方法中，我们将遍历 Set_List 中的每个集合，并使用集合查找操作来查找重复的集合。如果我们发现任何重复的集合，我们将其存储在一个新列表中。

# Define a list of sets to check for duplicates
Set_List = [
    {1, 2, 3},
    {4, 7, 2, 9},
    {5, 7, 6, 6},
    {3, 1, 2},
    {6, 7, 5, 4},
    {6, 7, 5}
]

# Define an empty list to store any duplicate sets
duplicate_sets = []

# Define an empty set to keep track of sets that have been seen
seen_sets = set()

# Iterate over each set in Ser_List
for s in Set_List:
    # Convert the set to a frozenset to make it hashable and immutable
    frozenset_s = frozenset(s)
    
    # Check if the frozenset has already been seen
    if frozenset_s in seen_sets:
        # If it has, then this set is a duplicate
        duplicate_sets.append(s)
    else:
        # If it hasn't, then add the frozenset to seen_sets
        seen_sets.add(frozenset_s)

# Check if any duplicate sets were found
if len(duplicate_sets) != 0:
    print(f"Duplicate sets found: {duplicate_sets}")
else:
    # If there were no duplicates
    print("No duplicate sets found.")

输出

Duplicate sets found: [{1, 2, 3}, {5, 6, 7}]

在此程序中，我们创建了一个名为 seen_sets 的空集合来跟踪之前见过的集合，以及一个名为 duplicate_sets 的集合来存储重复的集合。然后，我们遍历 Set_List 中的每个集合 s。

对于每个集合，我们创建一个 frozenset 对象 'frozenset_s'，它是可哈希的。然后我们检查 frozenset_s 是否已存在于 seen_sets 中。如果是，则 s 是一个重复的集合，我们将其添加到 duplicate_sets 中。如果不是，我们将 frozenset_s 添加到 seen_sets 中，以便在将来的迭代中进行重复检查。

最后，如果 duplicate_sets 不为空，我们将打印出每个重复的集合。否则，我们将打印一条消息，表明未找到重复的集合。

时间复杂度：O(n) - 该程序使用一个 for 循环来遍历 Set_List 中的每个集合。因此，循环的时间复杂度为 O(n)，其中 n 是 Set_List 的长度。在循环内部，存在一个集合查找操作，用于检查 frozenset 是否已被看到，这需要 O(1) 的时间。其中 n 是 set_list 中集合的数量。

空间复杂度：O(m) - 其中 m 是唯一集合的数量。由于每个集合都作为 frozenset 存储在 seen_sets 中，seen_sets 的最大大小等于输入列表中唯一集合的数量。在最坏的情况下，当 m 等于 n 时，它可以是 O(n)，在最好的情况下，当 m 等于 1 时，它可以是 O(1)。

方法 2 - 使用 collections 中的 Counter()

在此方法中，我们将使用 collections 中的 Counter 来计算每个集合的出现次数。然后，我们将筛选出出现次数大于 1 的集合，并将它们存储到 duplicate_sets 中，与往常一样。

# Importing Counter from collections
from collections import Counter

# Define a list of sets to check for duplicates
set_list = [
    {1, 2, 2, 3},
    {4, 7, 2, 9},
    {5, 7, 6},
    {3, 1, 2},
    {6, 7, 5, 4},
    {6, 7, 5},
    {1, 2, 3, 1},
    {4, 7, 2, 9}
]

# Count the occurrence of each set
set_count = Counter([frozenset(s) for s in set_list])

# Find the sets that occur more than once
duplicate_sets = [set(s) for s in set_count if set_count[s] > 1]

# Print the duplicate sets, if any
if len(duplicate_sets) != 0:
    print(f"Duplicate sets found: {duplicate_sets}")
else:
    # If there were no duplicates
    print("No duplicate sets found.")

输出

Duplicate sets found: [{1, 2, 3}, {9, 2, 4, 7}, {5, 6, 7}]

在此方法中，我们使用了列表推导式将每个集合转换为 frozen sets。这是必需的，因为集合是不可哈希的，不能用作字典的键。然后我们将这个 frozen sets 列表传递给 counter 类，它会返回一个 Counter 对象，其中每个 frozen set 作为键，它们的计数作为键的值。

然后，我们筛选出计数大于 1 的键，并将它们转换回集合后存储在 duplicate_sets 中。在程序结束时，我们打印了结果。

时间复杂度：O(n) - for 循环遍历 set_list 一次，并执行常数时间内的查找和比较 frozen sets 的操作。 其中 n 是 set_list 中集合的数量。

空间复杂度：O(m) - 其中 m 是唯一集合的数量。Counter 对象存储每个集合的计数，所需的空间取决于唯一集合的数量。重复的集合也需要一些空间，而这相对来说非常小。因此，程序的整体空间复杂度变为 O(m)。

方法 3 - 使用 collections 中的 defaultdict()

在此方法中，我们将使用 collections 中的 defaultdict 来跟踪 set_list 中每个集合的出现次数。在计算完频率后，我们将使用列表推导式来筛选出重复的集合。

# Importing defaultdict from collections
from collections import defaultdict

# Define a list of sets to check for duplicates
Set_List = [
    {1, 1, 2, 3},
    {2, 3, 6, 4},
    {5, 1, 12},
    {0, 11, 2, 16},
    {6, 7, 5, 4},
    {4, 6, 7, 5},
    {12, 5, 1, 5}
]

# Create a defaultdict to keep track of the frequency of each set
set_counts = defaultdict(int)

# Count the frequency of each set
for s in Set_List:
    set_counts[frozenset(s)] += 1
print(set_counts)

# Check for sets that appear more than once
duplicate_sets = [set(s) for s in set_counts if set_counts[s] > 1]

# Print the duplicate sets, if any
if len(duplicate_sets) != 0:
    print(f"Duplicate sets found: {duplicate_sets}")
else:
    # If there were no duplicates
    print("No duplicate sets found.")

输出

Duplicate sets found: [{1, 12, 5}, {4, 5, 6, 7}]

在上面的程序中，我们创建了一个 defaultdict 对象 'set_counts'，它是 Python 中 dict 的一个子类，用于跟踪 set_list 中每个集合的频率。我们遍历 set_list 中的每个集合，并使用 set_counts[frozenset(s)] += 1 来将频率计数加 1。

之后，我们使用列表推导式筛选出频率计数大于 1 的集合，这表明是重复的集合。最后，我们打印了结果。

时间复杂度：O(n) - 这里，我们也遍历了 set_list，这需要 O(n) 的时间，并且在列表推导式方法中，我们再次遍历了 set_list，这也需要 O(n) 的时间。总的来说，时间复杂度变为 O(n) + O(n) = O(n)，其中 n 是 set_list 中集合的数量。

空间复杂度：O(m) - 其中 m 是 set_list 中唯一集合的数量。set_counts 字典存储 set_list 中唯一集合的频率，duplicate_sets 列表存储重复的集合。在最坏的情况下，如果 m @ n，即所有集合都是唯一的，则空间复杂度将上升到 O(n)，而在最好的情况下，如果所有集合都是重复的（m = 1），则空间复杂度将变为常数 O(1)。

方法 4 - 使用字典进行哈希

哈希是指将不可哈希对象转换为可哈希且不可变形式的方法。这样做是为了我们可以将 set_list 中的集合用作字典的键，该字典使用哈希函数将每个键映射到不同的哈希值。

# Define a list of sets to check for duplicates
set_list = [
    {11, 22, 23},
    {4, 7, 9, 4},
    {15, 11, 22, 24},
    {1, 2, 3},
    {6, 7, 5, 4},
    {6, 4, 6, 7, 5},
    {23, 11, 22}
]

# Create an empty dictionary to store sets and their frequencies
freq_dict = {}

# Iterate over each set in set_list
for s in set_list:
    # Convert the set to a tuple and hash it
    set_hash = hash(tuple(s))
    
    # Check if the hash value is already in the dictionary
    if set_hash in freq_dict:
        # If it is, increment the frequency of this set
        freq_dict[set_hash] += 1
    else:
        # If it isn't, add the hash value to the dictionary with a frequency of 1
        freq_dict[set_hash] = 1

# Create an empty list to store any duplicate sets
duplicate_sets = []

# Iterate over each set in set_list again
for s in set_list:
    # Convert the set to a forzenset or a tuple and hash it
    set_hash = hash(tuple(s))
    
    # Check if the frequency of the hash value is greater than 1
    if freq_dict[set_hash] > 1:
        # If it is, check if the set has already been added to duplicate_sets
        if s not in duplicate_sets:
            # If it hasn't, add it to duplicate_sets
            duplicate_sets.append(s)

# Check if any duplicate sets were found
if len(duplicate_sets) != 0:
    # If there were duplicates
    print(f"Duplicate sets found: {duplicate_sets}")
else:
    # If there were no duplicates
    print("No duplicate sets found.")

输出

Duplicate sets found: [{11, 22, 23}, {4, 5, 6, 7}]

在此方法中，我们使用一个字典来存储每个集合的哈希值及其频率。我们首先遍历 set_list 中的每个集合，并使用 hash() 函数获取哈希值。然后，我们检查哈希值是否已存在于字典中。如果存在，我们增加该集合的频率。否则，我们将该集合的频率设置为 1。

我们再次遍历 set_list 中的集合以查找重复的集合并检查它们的频率。频率大于一的集合意味着它是重复的集合，需要添加到 duplicate_sets 列表中。在程序结束时，我们打印了结果。

时间复杂度：O(n) - 在上述程序中，我们遍历了 set_list 两次 O(n) + O(n) = O(n)，其中 n 是 set_list 中集合的数量。其他操作以常数时间执行。

空间复杂度：O(n) - freq_dict 和 duplicate_sets 列表的大小与 set_list 的大小成正比，O(n) + O(n) = O(n)，其中 n 是 set_list 中集合的数量。

下一主题Python 中的增强现实 (AR)

Python 程序查找集合列表中重复的集合

Python 程序：使用 frozenset() 和集合推导式从集合列表中删除重复的集合

方法 1 - 使用 for 循环和 frozenset()

方法 2 - 使用 collections 中的 Counter()

方法 3 - 使用 collections 中的 defaultdict()

方法 4 - 使用字典进行哈希

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Python 问题

Python 程序查找集合列表中重复的集合

Python 程序：使用 frozenset() 和集合推导式从集合列表中删除重复的集合

方法 1 - 使用 for 循环和 frozenset()

方法 2 - 使用 collections 中的 Counter()

方法 3 - 使用 collections 中的 defaultdict()

方法 4 - 使用字典进行哈希

相关帖子

使用 openweather Map API 在 Python 中创建天气预报 GUI

如何创建 Python 字典

如何在 Matplotlib 中更改绘图大小

Python 中的分类变量

使用 Python 和 Rich 创建 Wordle 克隆

如何在 Python 中将十六进制转换为二进制

使用 Wolfram Alpha API 的 Python GUI 助手

使用正则表达式验证银行账号

使用 Scikit-Learn 在 Python 中查找欧几里得距离

Python for Finance

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器