Keras 中的回调

2025年6月17日 | 阅读时长11分钟

在训练任何机器学习模型时，实现效率、适应性和监控都是必不可少的。通过 Keras（一个高级神经网络 API），可以在训练、评估或推理过程中的指定点激活回调函数，从而实现对模型行为的动态控制。

Keras 回调函数是用于与训练过程交互的自定义实用程序。它们可用于保存模型检查点、调整学习率、记录指标，甚至在满足某些条件时提前停止。这些对于优化训练工作流程和防止过拟合以优化模型性能至关重要。

现在，为了更好地理解，我们将使用各种模型来创建自定义 Keras 回调函数，并对其进行测试。

导入库

import numpy as np
from tqdm import tqdm
import os
import datetime
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler,  Normalizer
import seaborn as sb
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.metrics import confusion_matrix 
from tensorflow.keras.initializers import RandomNormal, RandomUniform, HeNormal, HeUniform, GlorotNormal, GlorotUniform
from sklearn.metrics import roc_auc_score, f1_score

my_data = np.genfromtxt('../input/callbackdata/callback_data.csv', delimiter=',',skip_header=1)
X = my_data[:,:2]
Y = my_data[:,2]
# train test split

train_X, test_X, y_train, y_test = train_test_split(X,Y, test_size = 0.33, random_state = 101)

# # normalizing data
# scalar_std = StandardScaler()
# scalar_std.fit(train_X)
# train_X = scalar_std.transform(train_X)
# test_X = scalar_std.transform(test_X)

输出

# normalize data
scalar_std =  StandardScaler()
scalar_std.fit(train_X)

train_X = scalar_std.transform(train_X)
test_X = scalar_std.transform(test_X)

print(train_X.shape)

输出

 
(13400, 2)

模型

现在，我们将创建一个基本的神经网络模型。

# create Basic Model

def model_create(inp_dim, out_dim, inpt_activation='sigmoid',out_activation='sigmoid', kernel_init=RandomUniform):
      print(inp_dim)
      model = Sequential(
          [
            Dense(256, input_shape=(inp_dim,), activation=inpt_activation, kernel_initializer = kernel_init),
            Dropout(0.3),
            Dense(128, activation=inpt_activation, kernel_initializer = kernel_init),
            Dense(64,  activation=inpt_activation, kernel_initializer = kernel_init),
            Dense(32,  activation=inpt_activation, kernel_initializer = kernel_init),
            Dense(16,  activation=inpt_activation, kernel_initializer = kernel_init),
            Dense(out_dim, activation='sigmoid',
                             bias_initializer=kernel_init)
        ]
      )

      return model

# create dataset
BATCH_SIZE = 32
SHUFFLE_BUFFER_SIZE = 100
dataset_train = tf.data.Dataset.from_tensor_slices((train_X, y_train))
dataset_test = tf.data.Dataset.from_tensor_slices((test_X, y_test))
dataset_train = dataset_train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
dataset_test = dataset_test.batch(BATCH_SIZE)

编写回调函数

要实现给定的回调函数，您需要编写一个自定义回调函数，以便在每个 epoch 结束时计算并打印微 F1 分数和 AUC 分数，而无需使用 tf.keras.metrics 进行这些计算。您还应该在验证准确率优于前一个 epoch 的每个 epoch 中保存模型。学习率衰减必须基于以下两个条件进行：- 如果前一个 epoch 的验证准确率下降，则学习率降低 10%。- 每第三个 epoch，学习率降低 5%。如果在训练过程中遇到任何 NaN 值（权重或损失），则立即停止训练。当验证准确率在最后两个 epoch 没有提高时，也应该停止训练。最后，使用 TensorBoard 跟踪和分析标量图和直方图；包括屏幕截图和观察结果以供评估。

# Helper functions


class MetricHelperClass(object):
    def __init__(self,pred_y_prob, true, threshold = 0.5):
        self.y_prob = pred_y_prob
        self.true = true
        self.threshold = threshold
        self.pred_y = self.get_prediction(self.y_prob, self.threshold)
        self.tn, self.fp, self.fn, self.tp = confusion_matrix(self.true, self.pred_y).ravel()

    def get_prediction(self, y_prob, threshold):
        pred_y = np.where(y_prob < threshold, 0, 1 )
        return pred_y


    def getTruePositiveRate(self):
        return self.tp/(self.tp+self.fn)

    def getFalsePositiveRate(self):
        return self.fp/(self.tn+self.fp)

    def precision(self):
        return self.tp/(self.tp+self.fp)

    def recall(self):
        return self.tp/(self.tp+self.fn)

    def F1score(self):
        p = self.precision()
        r = self.recall()
        return 2*p*r /( p+r)



def AUC(true, pred_y_prob):
    y_sorted_prob = sorted(pred_y_prob, reverse=True)

    thresholds = thresholdList = np.unique(y_sorted_prob).tolist()
    limit = max(200, len(thresholds))

    fprList = []
    tprList = []

    for threshold in thresholds[:limit]:
        confusion_metric = MetricHelperClass(pred_y_prob, true, threshold)
        y_threshold_pred = confusion_metric.get_prediction(pred_y_prob, threshold)
        fpr = confusion_metric.getFalsePositiveRate()
        tpr = confusion_metric.getTruePositiveRate()
        fprList.append(fpr)
        tprList.append(tpr)
  
    auc = np.trapz(sorted(tprList), sorted(fprList))
    return auc

CustomModelCheckPoint 是一个回调函数，可在训练期间当监控的指标（accuracy_val）提高时保存模型。它由基础路径和要监控的指标进行初始化。在训练开始时，会设置一个 history 字典来跟踪指标，并且最佳分数 self.best 初始化为负无穷。

在每个 epoch 结束时，它会检查 accuracy_val 是否在日志中；如果当前的 accuracy_val 优于之前最佳的，它会以包含 epoch 号码和 accuracy_val 的文件名保存模型。这确保了在训练期间保留最佳模型。

class CustomModelCheckPoint(tf.keras.callbacks.Callback):
    def __init__(self,basepath, monitor):
        self.path = basepath
        self.monitor = monitor
      
    def on_train_begin(self, logs={}):
        ## on begin of training, we create an instance variable called history
        ## It is a dict with keys [loss, acc, val_loss, val_acc]
        self.history={'loss': [],'accuracy': [],'val_loss': [],'accuracy_val': []}
        self.best = -np.inf

        
    def on_epoch_end(self, epoch, logs={}):
        if logs.get('accuracy_val', -1) != -1:
            self.history['accuracy_val'].append(logs.get('accuracy_val'))
        current = logs.get('accuracy_val')
        if np.greater(current, self.best):
            self.best = current
            filepath = self.path + "-epoch:{}-accuracy_val:{}.hdf5".format(epoch, logs['accuracy_val'])
            tf.keras.models.save_model(self.model, filepath, overwrite=True, include_optimizer=True, save_format='h5' )

MetricTracker 回调函数用于在训练时计算和跟踪验证数据的附加指标，如 F1 分数和 AUC。会初始化一个 history 字典来存储损失、准确率和自定义分数等指标。在每个 epoch 结束时，它会对验证标签进行预测，通过辅助函数计算 F1 分数和 AUC，并记录结果。这种回调通过提供比标准指标更具信息性的模型性能详细信息，进一步增强了监控功能。

class MetricTracker(tf.keras.callbacks.Callback):
    def __init__(self,trainX,trainY,testX, testY):
        self.test_X = testX
        self.y_test= testY
        self.train_X = trainX
        self.y_train = trainY
      
    def on_train_begin(self, logs={}):
        ## on begin of training, we create an instance variable called history
        ## It is a dict with keys [loss, acc, val_loss, val_acc]
        self.history={'loss': [],'accuracy': [],'val_loss': [],'accuracy_val': [],'val_recall': [],  'val_f1':[],  'val_auc':[]}
        
    def on_epoch_end(self, epoch, logs={}):
        
        y_test_pred_prob = self.model.predict(self.test_X).squeeze()

        # f1 score
        test_cm = MetricHelperClass(y_test_pred_prob, y_test, threshold=0.5)
        test_f1 = test_cm.F1score()
        self.history['val_f1'].append(test_f1)

        # AUC score
        # test_auc = AUC(y_test, y_test_pred_prob)
        test_auc = roc_auc_score(y_test, y_test_pred_prob )
        self.history['val_auc'].append(test_auc)
        print(" validation f1 score : {}, validation auc score : {} ".format( test_f1, test_auc))

在训练过程中，`CustomLRScheduler` 根据两个标准动态修改学习率：每第三个 epoch 将学习率衰减 5%，如果当前 epoch 的验证准确率差于前一个 epoch，则将其降低 10%。通过将此回调与模型的优化器集成，该优化器实时更新学习率，确保对训练过程进行更好的控制以提高性能。

class CustomLRScheduler(tf.keras.callbacks.Callback):

    def __init__(self,scheduler):
        super(CustomLRScheduler, self).__init__()
        self.scheduler = scheduler
        self.accuracy_prev = -np.inf
  

    def on_train_begin(self, logs={}):
        ## on begin of training, we create an instance variable called history
        ## It is a dict with keys [loss, acc, val_loss, val_acc]
        self.history={'loss': [],'accuracy': [],'val_loss': [],'accuracy_val': [],'val_recall': [],  'val_f1':[],  'val_auc':[]}
        
    def on_epoch_end(self, epoch, logs={}):
        if not hasattr(self.model.optimizer, "lr"):
              raise ValueError('Optimizer must have a "lr" attribute.')
        if logs.get('accuracy_val',-1) != -1:
            self.history['accuracy_val'] = logs.get('accuracy_val')

        current = logs.get('accuracy_val')
        # Get the current learning rate from the model's optimizer.
        curr_lr = float(tf.keras.backend.get_value(self.model.optimizer.learning_rate))
        # Call Scheduler to get updated learning rate 
        lr_schedule = self.scheduler(epoch, current, self.accuracy_prev, curr_lr)
        # Set the value back to the optimizer before this epoch starts
        tf.keras.backend.set_value(self.model.optimizer.lr, lr_schedule)
        print(" Epoch %03d: Learning rate is %6.7f." % (epoch, lr_schedule))
        # update accuracy_prev as current 
        self.accuracy_prev = current
def custom_learning_rate_scheduler(epoch, current_acc, prev_acc, lr):
    #  Cond1. If your validation accuracy at that epoch is less than the previous epoch accuracy, you have to decrease the learning rate by 10%. 
    if np.less(current_acc, prev_acc):
        lr =  0.9 * lr    
    #   Cond2. For every 3rd epoch, decay your learning rate by 5%.
    if (epoch + 1) % 3 == 0:
        lr = lr * 0.95
    return lr

`EarlyStoppingAtDecreasingValidationScore` 回调函数在验证准确率在指定数量的连续 epoch（`patience`）内停止提高时停止训练。

class EarlyStoppingAtDecreasingValidationScore(tf.keras.callbacks.Callback):
  
    def __init__(self, patience=0):
        super(EarlyStoppingAtDecreasingValidationScore, self).__init__()
        self.patience = patience
        

    def on_train_begin(self, logs=None):
        # The number of epochs it has waited for without improving accuracy
        self.wait = 0
        # the time period during which the training ends.
        self.epoch_stopped = 0
        # Set the preceding validation accuracy to -inf.
        self.accuracy_prev = -np.inf
        # weight_best to store the weights at which the minimum loss occurs.
        self.weight_best = None

    def on_epoch_end(self, epoch, logs=None):
        accuracy_current = logs.get("accuracy_val")

        # Look for the validation accuracy metric's early stopping criterion.
        #Reintialize the wait to 0 if the current accuracy surpasses the previous accuracy.
        if np.greater(accuracy_current, self.accuracy_prev):
            self.wait = 0
            # If the current results are better (less), note the best weights.
            self.weight_best = self.model.get_weights()
        else:
        #Increase the wait count if the accuracy is not better now than it was before.
        # If patience is outweighed by the wait counter
            self.wait += 1
            if self.wait > self.patience:
                # training stop at epoch number
                self.epoch_stopped = epoch
                # For stopped training, the stop training flag should be set to true.
                self.model.training_stop = True
                print("Restoring model weights from the end of the best epoch.")
                self.model.set_weights(self.weight_best)
        # resetting accuracy_prev as current accuracy
        self.accuracy_prev = accuracy_current

    def on_train_end(self, logs=None):
        if self.epoch_stopped > 0:
            print("Epoch %03d: early stopping" % (self.epoch_stopped + 1))

`TerminateAtNaNLossOrWeights` 回调函数是一个自定义的TensorFlow 回调函数，如果损失或模型的任何权重变得无效（例如 `NaN`（非数字）或 `infinity`）则停止训练。它有一个 `on_epoch_end` 方法，在每个 epoch 结束时调用，以检查损失值和模型中的每个权重。如果其中任何一个为 `NaN` 或 `infinity`，则通过设置 `self.model.training_stop = True` 来停止训练。这可以防止模型继续使用可能导致性能不佳或不稳定的有问题的值进行训练。

class TerminateAtNaNLossOrWeights(tf.keras.callbacks.Callback):
    def __init__(self):
        super(TerminateAtNaNLossOrWeights, self).__init__()
        
    def on_epoch_end(self, epoch, logs={}):
        loss = logs.get('loss')

        if loss is not None:
            #Stop training if the loss is nan or infinite
            if np.isnan(loss) or np.isinf(loss):
                print("Invalid loss and terminated at epoch {}".format(epoch))
                self.model.training_stop = True
        # checking for weights
        for weight in self.model.weights:
            #Stop training if the loss is nan or infinite
            if np.isnan(weight).any() or np.isinf(weight).any():
                print("Invalid weights  and terminated at epoch {}".format(epoch))
                self.model.training_stop = True

termination = TerminateAtNaNLossOrWeights()

使用自定义回调函数进行模型训练

我们将连同自定义回调函数一起训练我们的模型。

# Callbacks


# Custom Callback to track AUC and F1 score
customMetricCallback = MetricTracker(train_X, y_train, test_X, y_test) 
 
# Custom Callback for Learning Rate schedular
lrscheduler = CustomLRScheduler(custom_learning_rate_scheduler)
#Depending on the number of epochs, a custom callback is used to halt the process early if validation accuracy is not increased.
earlystopping = EarlyStoppingAtDecreasingValidationScore(patience=2)
# In the event that weights or losses are invalid, a custom callback will end training.
terminationAtNan = TerminateAtNaNLossOrWeights()

在 model-1 中，除输出层外，所有层都使用 tanh 激活函数；它使用带有动量的 SGD 作为优化器，并使用 RandomUniform(0, 1) 初始化器进行权重初始化。之后，应该分析输出和训练，以评估模型是否正确收敛，是否存在其他问题（如验证损失和准确率的平台期），以及整个设置是否能有效地提供最佳结果。

# model 1 
model1 = model_create(inp_dim= train_X.shape[-1], out_dim = 1, inpt_activation = 'tanh',out_activation= 'sigmoid', kernel_init=RandomUniform)
model1.summary()
# SGD Optimizer
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9)
# tensorboard log directory
dir_log = os.path.join("logs_model_1",'fits', datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
# Base path to store weights
pathBase = 'model_1_save/weights'

# Custom Callback to store model 
model_checkpoint_callback = CustomModelCheckPoint(pathBase,'val_loss') 

# Tensotboard callback to visualize the network parameters         
tensorboard_callback = tf.keras.callbacks.TensorBoard(dir_log=dir_log,histogram_freq=1,write_graph=True)

model1.compile(loss=tf.keras.losses.BinaryCrossentropy(),
              optimizer=optimizer,
              metrics=['accuracy'],
              )
model1.fit(dataset_train,epochs=10,validation_data=dataset_test,batch_size=64,
          callbacks=[
                      tensorboard_callback,
                      customMetricCallback,
                      model_checkpoint_callback,
                      lrscheduler,
                      earlystopping,
                      terminationAtNan
                     
                     ])

输出

Model-2 除最终输出层外，所有层均采用 ReLU 激活，使用带有动量的 SGD 作为优化器，并使用 RandomUniform(0, 1) 初始化器进行权重初始化。需要检查输出和训练过程，了解它在哪个阶段收敛，是否存在验证损失和准确率平台期等现象，以及这种设置如何有助于获得正确的结果。

# model 2
model2 = model_create(inp_dim= train_X.shape[-1], out_dim = 1, inpt_activation = 'relu',out_activation= 'sigmoid', kernel_init=RandomUniform)
# SGD Optimizer
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9)
# tensorboard log directory
dir_log = os.path.join("logs_model_2",'fits', datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
# Base path to store weights
pathBase = 'model_2_save/weights'



# Custom Callback to store model 
model_checkpoint_callback = CustomModelCheckPoint(pathBase,'val_loss') 

# Tensotboard callback to visualize the network parameters         
tensorboard_callback = tf.keras.callbacks.TensorBoard(dir_log=dir_log,histogram_freq=1,write_graph=True)


model2.compile(loss=tf.keras.losses.BinaryCrossentropy(),
              optimizer=optimizer,
              metrics=['accuracy'],
              )
model2.fit(dataset_train,epochs=10,validation_data=dataset_test,batch_size=64,
          callbacks=[
                      tensorboard_callback,
                      customMetricCallback,
                      model_checkpoint_callback,
                      lrscheduler,
                      earlystopping,
                      terminationAtNan
                     
                     ])

输出

Model-3 除输出层外，所有层均使用 ReLU 激活函数，使用带有动量的 SGD 作为优化器，并使用 HeUniform() 作为权重初始化器。应该检查训练后的输出和训练过程，以估计模型的性能，包括验证损失和准确率的提高，以及这种配置相对于先前设置在实现更好收敛和最佳结果方面的整体有效性。

# model 3
model3 = model_create(inp_dim= train_X.shape[-1], out_dim = 1, inpt_activation = 'relu',out_activation= 'sigmoid', kernel_init=HeUniform)
# SGD Optimizer
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9)
# tensorboard log directory
dir_log = os.path.join("logs_model_3",'fits', datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
# Base path to store weights
pathBase = 'model_3_save/weights'


# Custom Callback to store model 
model_checkpoint_callback = CustomModelCheckPoint(pathBase,'val_loss') 

# Tensotboard callback to visualize the network parameters         
tensorboard_callback = tf.keras.callbacks.TensorBoard(dir_log=dir_log,histogram_freq=1,write_graph=True)

model3.compile(loss=tf.keras.losses.BinaryCrossentropy(),
              optimizer=optimizer,
              metrics=['accuracy'],
              )
model3.fit(dataset_train,epochs=10,validation_data=dataset_test,batch_size=64,
          callbacks=[
                      tensorboard_callback,
                      customMetricCallback,
                      model_checkpoint_callback,
                      lrscheduler,
                      earlystopping,
                      terminationAtNan
                     
                     ])

输出

Model-4 除输出层外，所有层均采用 ReLU 作为激活函数。Adam 用作优化器，HeNormal() 用作权重初始化器。训练后，必须检查输出和训练过程，以评估模型的性能。将观察到验证损失和准确率的提高，以及这种配置与先前模型相比在实现更好收敛和最佳结果方面的潜力。

# model 4
model4 = model_create(inp_dim= train_X.shape[-1], out_dim = 1, inpt_activation = 'relu',out_activation= 'sigmoid', kernel_init=HeNormal)
# SGD Optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
# tensorboard log directory
dir_log = os.path.join("logs_model_4",'fits', datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
# Base path to store weights
pathBase = 'model_4_save/weights'

# Custom Callback to store model 
model_checkpoint_callback = CustomModelCheckPoint(pathBase,'val_loss')  
# Tensotboard callback to visualize the network parameters         
tensorboard_callback = tf.keras.callbacks.TensorBoard(dir_log=dir_log,histogram_freq=1,write_graph=True)

model4.compile(loss=tf.keras.losses.BinaryCrossentropy(),
              optimizer=optimizer,
              metrics=['accuracy'],
              )
model4.fit(dataset_train,epochs=10,validation_data=dataset_test,batch_size=64,
          callbacks=[
                      tensorboard_callback,
                      customMetricCallback,
                      model_checkpoint_callback,
                      lrscheduler,
                      earlystopping,
                      terminationAtNan
                     
                     ])

输出

结果

Model-1：tanh 激活与统一随机初始化

tanh 激活和统一随机初始化组合会导致验证损失和准确率出现平台期，这表明这不是模型收敛的最佳设置。
由于随机统一初始化，权重和偏差的变化是统一的，这可能是性能不佳的原因。
使用统一随机初始化会使网络处于不太理想的初始状态，这意味着无法获得最优解，从而导致结果变差。

Model-2：ReLU 激活使用随机统一初始化

与 Model-1 类似，ReLU 激活与随机统一初始化一起，也会导致验证损失和准确率曲线持平。
同样，权重和偏差的变化是统一的，这无助于获得更好的性能。
适当的初始化对于神经网络有效地找到损失函数的局部最小值至关重要。从不良的初始化开始可能会阻止模型达到全局最小值并获得最佳结果。

Model-3：ReLU 激活与 He 统一初始化

与早期模型相比，ReLU 激活与统一初始化结合显示出更好的性能。验证损失和准确率正在提高，这意味着这种组合效果更好。
权重和偏差的变化被认为更有利，即均值接近 0，方差非常小，这支持更好的学习和模型收敛。
与之前的模型类似，初始化仍然很关键。He 统一初始化使网络从一个更有利的位置开始，这反过来又增加了收敛到最优解的机会。
深度学习社区广泛推荐 ReLU 激活与 He 初始化（包括统一和标准）的组合，该模型的 L结果也与此一致。

Model-4：ReLU 激活与 He 标准初始化

对于 ReLU 和 He 标准初始化，性能与 ReLU 和 He 统一初始化非常相似，在验证损失和准确率的提高方面表现出相似之处。
权重和偏差的变化均值为 0，方差很小，确保了稳定的学习和更好的模型收敛初始化。

下一个主题2021 年十大机器学习课程

Keras 中的回调

导入库

模型

编写回调函数

使用自定义回调函数进行模型训练

结果

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

机器学习

监督式学习

分类

杂项

相关教程

面试题

Keras 中的回调

导入库

模型

编写回调函数

使用自定义回调函数进行模型训练

结果

相关帖子

机器学习中的矩阵分解

ReLU

机器学习中的解析解

极限和连续性简介

机器学习中的角色

机器学习中的词袋 (BoW) 模型

协方差矩阵的意义和应用

机器学习中的解析解

什么是机器学习中的 Softmax 激活函数

自由度

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器