Python中的候选消除算法

2025年1月5日 | 阅读 4 分钟

给定假设空间 H 和实例集合 E，候选消除法通过版本空间逐步构建。逐一添加示例；通过消除与示例相矛盾的假设，每个示例都可以缩小版本空间。这是通过候选消除过程完成的，该过程在每次出现新情况时更新一般边界和特殊边界。

这可以被视为 Find-S 方法的更高级版本。
考虑既有积极又有消极的实例。
实际上，Find-S 方法在这种情况下使用积极的实例（基本上，它们是从规范中泛化的）。
相比之下，泛化形式指定了消极的例子。

基本术语

概念学习：本质上，概念学习是机器的学习任务（通过训练数据学习）。
总的来说，在不向机器提供具体特征来学习的情况下。
G = {'?', '?', '?', '?'…}：特征的数量
特殊假设：识别机器学习的特征（特殊特征）
S= {'pi,' 'pi,' 'pi'…}：各种质量决定了 pi 的数量。
版本空间：它位于特殊假设和一般假设之间。除了一个之外，它还会根据训练数据集生成所有可能的假设的列表。

算法

步骤 1：加载数据集

步骤 2：初始化一般假设和特殊假设。

步骤 3：对于每个训练示例

步骤 4：如果示例是积极示例

          if attribute_value == hypothesis_value:
             Do nothing  
          else:
             replace attribute value with '?' (Basically generalizing it)

步骤 5：如果示例是消极示例

使泛化假设更加具体。

示例

Candidate Elimination Algorithm in Python

算法步骤

Initially: G = [[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], 
                 [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?]]
            S = [Null, Null, Null, Null, Null, Null]
            
For instance1: <'sunny,' 'warm,' 'normal,' 'strong,' 'warm, ''same'> and positive output.
            G1 = G
            S1 = ['sunny,' 'warm,' 'normal,' 'strong,' 'warm, ''same']
            
For instance, 2: <'sunny,' 'warm,' 'high,' 'strong,' 'warm, ''same'> and positive output.
            G2 = G
            S2 = ['sunny,' 'warm',? 'strong,' 'warm, ''same']
            
For instance, 3: <'rainy,' 'cold,' 'high,' 'strong,' 'warm, ''change'> and negative output.
            G3 = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], 
                  [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, 'same']]
            S3 = S2     
            
For instance4: <'sunny,' 'warm,' 'high,' 'strong,' 'cool,' 'change'> and positive output.
            G4 = G3
            S4 = ['sunny,' 'warm',? 'strong', ??]         
At last, by synchronizing the G4 and S4 algorithm, the output.

代码

import copy
G = [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
     ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
S = ['Null', 'Null', 'Null', 'Null', 'Null', 'Null']
def is_consistent(hypothesis, example):
    for i in range(len(hypothesis)):
        if hypothesis[i] != '?' and hypothesis[i] != example[i]:
            return False
    return True
def update_hypotheses_positive(G, S, example):
    for i in range(len(G)):
        if is_consistent(G[i], example):
            continue
        else:
            for j in range(len(G[i])):
                if G[i][j] == '?':
                    G[i][j] = example[j]
                else:
                    G[i][j] = '?'
    for j in range(len(S)):
        if S[j] == 'Null':
            S[j] = example[j]
        elif S[j] != example[j]:
            S[j] = '?'
def update_hypotheses_negative(G, S, example):
    new_G = copy.deepcopy(G)
    for i in range(len(G)):
        if not is_consistent(G[i], example):
            new_G.remove(G[i])
    return new_G
instance1 = ['sunny', 'warm', 'normal', 'strong', 'warm', 'same']
instance2 = ['sunny', 'warm', 'high', 'strong', 'warm', 'same']
instance3 = ['rainy', 'cold', 'high', 'strong', 'warm', 'change']
instance4 = ['sunny', 'warm', 'high', 'strong', 'cool', 'change']
update_hypotheses_positive(G, S, instance1)
print("After instance 1:")
print("G =", G)
print("S =", S)
update_hypotheses_positive(G, S, instance2)
print("\nAfter instance 2:")
print("G =", G)
print("S =", S)
G = update_hypotheses_negative(G, S, instance3)
print("\nAfter instance 3:")
print("G =", G)
print("S =", S)
update_hypotheses_positive(G, S, instance4)
print("\nAfter instance 4:")
print("G =", G)
print("S =", S)

输出

G = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?]]
S = ['sunny,' 'warm',? 'strong', ??]

对于分类任务，候选消除法 (CEA) 比 Find-S 方法更可取。虽然 CEA 和 Find-S 有许多共同点，但它们在重要方面也有显著差异，这些差异有利有弊。以下是 CEA 与 Find-S 的一些优缺点：

CEA 与 Find-S 的优势

准确性提高：在处理噪声或不完整数据时，CEA 可以通过同时考虑积极和消极实例来生成假设。
灵活性：多类任务和非线性决策边界是 CEA 可以处理的更复杂的分类任务的示例。
更有效：通过生成一系列一般假设然后逐一拒绝它们，CEA 减少了假设的总数。由此可以提高效率并加快处理速度。
改进的连续属性处理：CEA 更适合更多样化的数据集，因为它可以处理连续属性，为每个属性设置界限。

CEA 相对于 Find-S 的缺点

更复杂：与 Find-S 相比，CEA 是一种更复杂的算法，对于初学者或没有扎实机器学习经验的人来说，可能更难使用和理解。
更高的内存要求：CEA 可能不适用于内存有限的环境，因为它需要更多内存来存储边界和假设集。
大型数据集处理速度较慢：由于生成的假设数量较多，CEA 在处理大型数据集时可能需要更长的处理时间。
过拟合的可能性增加：由于其复杂性增加，CEA 可能更容易在训练数据上过拟合，特别是在数据集噪声水平高或数据集较小的情况下。

下一主题Datime-formatting-in-python

Python中的候选消除算法

基本术语

算法

CEA 与 Find-S 的优势

CEA 相对于 Find-S 的缺点

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

其他

Python中的候选消除算法

基本术语

算法

CEA 与 Find-S 的优势

CEA 相对于 Find-S 的缺点

相关帖子

使用SSO通过Python连接到Snowflake

Python中的校验和

Python中的分治算法

Python中星号*的5种语法应用

Python中的字符串插值

如何从Python函数返回JSON对象

Python中根据列中的NaN值删除Pandas DataFrame的行

Python中的PySpark withColumn

使用OpenCV和Imutils在Python中进行图像平移

使用Python检查文件或目录是否存在

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器