使用 Keras 进行 YOLOv3 目标检测

2025 年 6 月 20 日 | 阅读 9 分钟

对象检测是计算机视觉中的核心任务，它识别和定位图像中的对象。在众多技术中，YOLO（You Only Look Once）因其速度和准确性而广受欢迎。YOLOv3 是 YOLO 系列的第三次迭代，在性能和精度方面都有显著改进，使其成为实时对象检测的首选。

理解 YOLOv3

YOLOv3 将图像划分为网格，同时尝试预测边界框、类别概率和对象置信度分数。与大多数用于对象检测算法的滑动窗口技术等传统方法不同，YOLO 将该过程视为一个回归问题，从而显著加快了过程。YOLOv3 引入了对早期版本的两项改进，即通过特征金字塔网络进行多尺度特征检测和骨干网络架构 Darknet-53，这两项改进都提高了其准确性并使网络能够有效地检测较小的对象。

该模型输出三种不同尺度的预测，从而允许模型有效地检测不同大小的对象。它使用锚框来预测边界框，并应用非极大值抑制来移除冗余预测。

在 Keras 中设置 YOLOv3

使用 Keras 设置 YOLOv3

在 Keras 上实现 YOLOv3 需要许多步骤，例如加载预训练模型并通过运行推理处理输入数据。首先，获取 YOLOv3 的权重和配置文件，这些文件通常由原始 Darknet 实现分发。

现在是数据集的路径。

代码

我们现在在 Keras 中实现 YOLOv3 模型，从 Darknet.weights 文件加载预训练权重，将架构定义为具有批量归一化、Leaky ReLU 激活函数和残差连接的模块化卷积块，然后使用名为 ReaderWeight 的类读取和处理 Darknet 权重，将其重塑为适合加载到模型中的格式。加载后，模型可以保存为 H5 格式，并用于对象检测任务。

代码

 
# Importing necessary libraries and modules for building a model
import numpy as np
import struct

# Importing Keras layers for model construction
from keras.layers import (
    UpSampling2D,       # For upsampling the input
    Conv2D,             # For 2D convolution
    Input,              # To define the input layer
    BatchNormalization, # To normalize activations in the network
    LeakyReLU,         # For Leaky ReLU activation function
    ZeroPadding2D,     # For adding padding around the input
)

# Importing model-related utilities
from keras.layers.merge import add, concatenate  # For merging layers
from keras.models import Model  # To build the Keras model

def _convolutional_block(input_layer, convolutions, add_skip_connection=True):
    x = input_layer
    count = 0
    for conv in convolutions:
        if count == (len(convolutions) - 2) and add_skip_connection:
            skip_connection = x
        count += 1
        if conv['stride'] > 1:
            x = ZeroPadding2D(((1,0),(1,0)))(x)  # padding according to Darknet preferences (left and top)
        
        # Apply convolution, batch normalization, and LeakyReLU
        x = Conv2D(conv['filters'],
                   conv['kernel_size'],
                   strides=conv['stride'],
                   padding='valid' if conv['stride'] > 1 else 'same',
                   name='conv_' + str(conv['layer_index']),
                   use_bias=False if conv['use_batch_norm'] else True)(x)
        
        if conv['use_batch_norm']:
            x = BatchNormalization(epsilon=0.001, name='batch_norm_' + str(conv['layer_index']))(x)
        
        if conv['apply_leaky_relu']:
            x = LeakyReLU(alpha=0.1, name='leaky_relu_' + str(conv['layer_index']))(x)

    return add([skip_connection, x]) if add_skip_connection else x


def create_yolov3_model():
    input_image = Input(shape=(None, None, 3))
    
    # Initial convolutional block (Layer 0 => 4)
    x = _convolutional_block(input_image, [{'filters': 32, 'kernel_size': 3, 'stride': 1, 'use_batch_norm': True, 'apply_leaky_relu': True, 'layer_index': 0},
                                           {'filters': 64, 'kernel_size': 3, 'stride': 2, 'use_batch_norm': True, 'apply_leaky_relu': True, 'layer_index': 1},
                                           {'filters': 32, 'kernel_size': 1, 'stride': 1, 'use_batch_norm': True, 'apply_leaky_relu': True, 'layer_index': 2},
                                           {'filters': 64, 'kernel_size': 3, 'stride': 1, 'use_batch_norm': True, 'apply_leaky_relu': True, 'layer_index': 3}])
    
    # Subsequent convolutional blocks (Layer 5 => 8, Layer 9 => 11, etc.)
    x = _convolutional_block(x, [{'filters': 128, 'kernel_size': 3, 'stride': 2, 'use_batch_norm': True, 'apply_leaky_relu': True, 'layer_index': 5},
                                 {'filters': 64, 'kernel_size': 1, 'stride': 1, 'use_batch_norm': True, 'apply_leaky_relu': True, 'layer_index': 6},
                                 {'filters': 128, 'kernel_size': 3, 'stride': 1, 'use_batch_norm': True, 'apply_leaky_relu': True, 'layer_index': 7}])
    # (Further layers would follow a similar pattern)
    skip_36 = x
    
    # Other layers (e.g., Layer 37 => 40, 41 => 61, etc.)
    x = _convolutional_block(x, [{'filters': 512, 'kernel_size': 3, 'stride': 2, 'use_batch_norm': True, 'apply_leaky_relu': True, 'layer_index': 37},
                                 {'filters': 256, 'kernel_size': 1, 'stride': 1, 'use_batch_norm': True, 'apply_leaky_relu': True, 'layer_index': 38},
                                 {'filters': 512, 'kernel_size': 3, 'stride': 1, 'use_batch_norm': True, 'apply_leaky_relu': True, 'layer_index': 39}])
    
    # Additional layers would continue similarly...
    
    model = Model(input_image, [output_layer_82, output_layer_94, output_layer_106])  # Final model with 3 output layers
    return model


class WeightsReader:
    def __init__(self, weight_file):
        with open(weight_file, 'rb') as file:
            major, = struct.unpack('i', file.read(4))
            minor, = struct.unpack('i', file.read(4))
            revision, = struct.unpack('i', file.read(4))
            if (major * 10 + minor) >= 2 and major < 1000 and minor < 1000:
                file.read(8)
            else:
                file.read(4)
            transpose = (major > 1000) or (minor > 1000)
            binary = file.read()
        
        self.offset = 0
        self.all_weights = np.frombuffer(binary, dtype='float32')

    def read_bytes(self, size):
        self.offset += size
        return self.all_weights[self.offset - size:self.offset]

    def load_model_weights(self, model):
        for i in range(106):
            try:
                conv_layer = model.get_layer('conv_' + str(i))
                print(f"Loading weights for convolution layer #{i}")
                
                if i not in [81, 93, 105]:  # Skip layers
                    batch_norm_layer = model.get_layer('batch_norm_' + str(i))
                    size = np.prod(batch_norm_layer.get_weights()[0].shape)
                    beta = self.read_bytes(size)  # Bias
                    gamma = self.read_bytes(size)  # Scale
                    mean = self.read_bytes(size)  # Mean
                    variance = self.read_bytes(size)  # Variance
                    batch_norm_layer.set_weights([gamma, beta, mean, variance])
                
                if len(conv_layer.get_weights()) > 1:
                    bias = self.read_bytes(np.prod(conv_layer.get_weights()[1].shape))
                    kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
                    kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
                    kernel = kernel.transpose([2, 3, 1, 0])
                    conv_layer.set_weights([kernel, bias])
                else:
                    kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
                    kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
                    kernel = kernel.transpose([2, 3, 1, 0])
                    conv_layer.set_weights([kernel])
            except ValueError:
                print(f"No convolution layer #{i}")

    def reset(self):
        self.offset = 0   

可视化函数

我们将有一个用于边界框的 BoundBox 类，并通过多个 decode_netout 和 yolo_boxes_correct 处理模型输出，通过 nms_do 移除冗余框，加载图像像素，并通过置信度阈值获取过滤对象的框。最后，boxes_draw 函数通过绘制边界框和在图像上方显示标签来显示结果。

代码

 
from matplotlib import pyplot as plt
from matplotlib.patches import Rectangle

class Box:
    def __init__(self, x_min, y_min, x_max, y_max, score=None, class_probs=None):
        self.x_min = x_min
        self.y_min = y_min
        self.x_max = x_max
        self.y_max = y_max
        self.score = score
        self.class_probs = class_probs
        self.label = -1
        self.confidence = -1

    def get_label(self):
        if self.label == -1:
            self.label = np.argmax(self.class_probs)
        return self.label

    def get_confidence(self):
        if self.confidence == -1:
            self.confidence = self.class_probs[self.get_label()]
        return self.confidence

def sigmoid(x):
    return 1. / (1. + np.exp(-x))

def decode_predictions(output, anchor_boxes, threshold, grid_height, grid_width):
    grid_h, grid_w = output.shape[:2]
    num_boxes = 3
    output = output.reshape((grid_h, grid_w, num_boxes, -1))
    num_classes = output.shape[-1] - 5
    boxes = []
    output[..., :2] = sigmoid(output[..., :2])
    output[..., 4:] = sigmoid(output[..., 4:])
    output[..., 5:] = output[..., 4][..., np.newaxis] * output[..., 5:]
    output[..., 5:] *= output[..., 5:] > threshold

    for i in range(grid_h * grid_w):
        row = i // grid_w
        col = i % grid_w
        for b in range(num_boxes):
            object_score = output[int(row)][int(col)][b][4]
            if object_score <= threshold:
                continue
            x, y, w, h = output[int(row)][int(col)][b][:4]
            x = (col + x) / grid_w
            y = (row + y) / grid_h
            w = anchor_boxes[2 * b + 0] * np.exp(w) / grid_width
            h = anchor_boxes[2 * b + 1] * np.exp(h) / grid_height
            class_probs = output[int(row)][col][b][5:]
            box = Box(x - w / 2, y - h / 2, x + w / 2, y + h / 2, object_score, class_probs)
            boxes.append(box)
    return boxes

def adjust_boxes(boxes, image_height, image_width, net_height, net_width):
    new_w, new_h = net_width, net_height
    for i in range(len(boxes)):
        offset_x, scale_x = (net_width - new_w) / 2. / net_width, float(new_w) / net_width
        offset_y, scale_y = (net_height - new_h) / 2. / net_height, float(new_h) / net_height
        boxes[i].x_min = int((boxes[i].x_min - offset_x) / scale_x * image_width)
        boxes[i].x_max = int((boxes[i].x_max - offset_x) / scale_x * image_width)
        boxes[i].y_min = int((boxes[i].y_min - offset_y) / scale_y * image_height)
        boxes[i].y_max = int((boxes[i].y_max - offset_y) / scale_y * image_height)

def calculate_overlap(a_interval, b_interval):
    x1, x2 = a_interval
    x3, x4 = b_interval
    if x3 < x1:
        if x4 < x1:
            return 0 
        else:
            return min(x2, x4) - x1
    else:
        if x2 < x3:
            return 0
        else:
            return min(x2, x4) - x3

def compute_iou(box1, box2):
    width_intersection = calculate_overlap([box1.x_min, box1.x_max], [box2.x_min, box2.x_max])
    height_intersection = calculate_overlap([box1.y_min, box1.y_max], [box2.y_min, box2.y_max])
    intersection = width_intersection * height_intersection
    width_1, height_1 = box1.x_max - box1.x_min, box1.y_max - box1.y_min
    width_2, height_2 = box2.x_max - box2.x_min, box2.y_max - box2.y_min
    union = width_1 * height_1 + width_2 * height_2 - intersection
    return float(intersection) / union

def non_maximum_suppression(boxes, nms_threshold):
    if len(boxes) > 0:
        num_classes = len(boxes[0].class_probs)
    else:
        return
    for c in range(num_classes):
        sorted_indices = np.argsort([-box.class_probs[c] for box in boxes])
        for i in range(len(sorted_indices)):
            i_index = sorted_indices[i]
            if boxes[i_index].class_probs[c] == 0:
                continue
            for j in range(i + 1, len(sorted_indices)):
                j_index = sorted_indices[j]
                if compute_iou(boxes[i_index], boxes[j_index]) >= nms_threshold:
                    boxes[j_index].class_probs[c] = 0

def load_image(filename, target_size):
    # Load image to get its original size
    image = load_img(filename)
    original_width, original_height = image.size
    
    # Resize image to match the target size
    image = load_img(filename, target_size=target_size)
    
    # Convert the image into a numpy array
    image = img_to_array(image)
    
    # Normalize pixel values to the range [0, 1]
    image = image.astype('float32')
    image /= 255.0
    
    # Add an additional dimension to match input shape (batch size)
    image = dims_expand(image, 0)
    
    return image, original_width, original_height

def filter_boxes(boxes, labels, threshold):
    valid_boxes, valid_labels, valid_scores = list(), list(), list()
    
    # Iterate through each box
    for box in boxes:
        # Iterate through each label
        for i in range(len(labels)):
            # If the score for this label is above the threshold, keep the box
            if box.class_probs[i] > threshold:
                valid_boxes.append(box)
                valid_labels.append(labels[i])
                valid_scores.append(box.class_probs[i] * 100)
    
    return valid_boxes, valid_labels, valid_scores

def draw_boxes(filename, valid_boxes, valid_labels, valid_scores):
    # Load the image
    image_data = plt.imread(filename)
    
    # Display the image
    plt.imshow(image_data)
    
    # Get the axes for plotting boxes
    ax = plt.gca()
    
    # Plot each valid box
    for i in range(len(valid_boxes)):
        box = valid_boxes[i]
        
        # Get coordinates for the bounding box
        y1, x1, y2, x2 = box.y_min, box.x_min, box.y_max, box.x_max
        
        # Calculate the box's width and height
        width, height = x2 - x1, y2 - y1
        
        # Create a rectangle patch
        rect = Rectangle((x1, y1), width, height, fill=False, color='white')
        
        # Add the rectangle to the axes
        ax.add_patch(rect)
        
        # Add the label and score at the top left corner of the box
        label = "%s (%.3f)" % (valid_labels[i], valid_scores[i])
        plt.text(x1, y1, label, color='white')
    
    # Show the final image with boxes
    plt.show()   

模型

现在我们将制作我们的模型。

代码

 
# defining the model
model = make_yolov3_model()

# Loading the model weights
# We have loaded the pretrained weights in a separate dataset
reader_weight = ReaderWeight('..lyft-3d-recognition/yolov3.weights')

# setting the model weights into the model
reader_weight.load_weights(model)

# saving the model to file
model.save('model.h5')   

Object Detection with YOLOv3 using Keras

加载模型

我们将权重导入模型，因为它们现在已保存在 h5 文件中（位于输出文件中）。

代码

 
# loading the yolov3 model
from keras.models import load_model
model = load_model('model.h5')

model.summary()   

输出

检测中

我们现在尝试猜测前三张训练集照片中的对象，以查看我们的解决方案是否有效。

代码

 
# Preprocessing parameters used for YOLOv3 dataset
anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]

# Define the expected input shape for the model.
WIDTH, HEIGHT = 416, 416

# Establish the probability threshold for items that are detected.
threshold_class = 0.3

import os
from matplotlib import pyplot as plt
images = os.listdir('../input/3d-object-detection-for-autonomous-vehicles/train_images')[:10]

from numpy import dims_expand
from keras.preprocessing.image import load_img, img_to_array

# Loading and preparing the image
def load_image_pixels(filename, shape):
    '''
    The function records the original shape, which is subsequently used to generate the boxes, and preprocesses the photos to 416x416, the typical input shape for YOLOv3.
    
    paramters:
    filename {String}: path to the picture
    shape {tuple}: the shape of the input dimensions of the network
    
    returns:
    image {PIL}: image of shape 'shape'
    width {int}: original width of the image
    height {int}: original height of the image
    '''
    # loading  the image to get its shape
    image = load_img(filename)
    width, height = image.size
    
   
#Loading the image with the specified dimensions
image = load_img(filename, target_size=shape)

#Converting the image to a numpy array
image = img_to_array(image)

#Normalizing the pixel values to the range [0, 1]
image = image.astype('float32') image /= 255.0

    
    # Adding a dimension so that we have one sample
    image = dims_expand(image, 0)
    return image, width, height


for file in images:
    photo_filename = PATH_DATA + 'train_images/' + file
    
    # loading the picture with old dimensions
    image, w_image, h_image = load_image_pixels(photo_filename, (WIDTH, HEIGHT))
    
    # Predicting the image
    yhat = model.predict(image)
    
    # Creating the boxes
    boxes = list()
    for i in range(len(yhat)):
        # decode the network's output
        boxes += decode_netout(yhat[i][0], anchors[i], threshold_class, HEIGHT, WIDTH)

    # Adjust the bounding box sizes to fit the image's form.
    yolo_boxes_correct(boxes, h_image, w_image, HEIGHT, WIDTH)

    # suppress boxes that are not maximum
    nms_do(boxes, 0.5)

    # Describe the labels. (Only the ones used to pre-train the YOLOv3 model that were pertinent to this job were filtered.)
    labels = ["person", "bicycle", "car", "motorbike", "airplane", "bus", "train", "truck", "boat"]

# get the recognized items' information
    boxes_v, labels_v, v_scores = get_boxes(boxes, labels, threshold_class)

    # sum up our findings.
    for i in range(len(boxes_v)):

        print(labels_v[i], v_scores[i])

    # Draw what we discovered.
    boxes_draw(photo_filename, boxes_v, labels_v, v_scores)   

输出

下一主题CNB 算法

使用 Keras 进行 YOLOv3 目标检测

理解 YOLOv3

使用 Keras 设置 YOLOv3

可视化函数

模型

加载模型

检测中

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

机器学习

监督式学习

分类

杂项

相关教程

面试题

使用 Keras 进行 YOLOv3 目标检测

理解 YOLOv3

使用 Keras 设置 YOLOv3

可视化函数

模型

加载模型

检测中

相关帖子

机器学习在交易中的应用

Light Gradient Boosted Machine (LightGBM)

聚类算法的评估指标

深度分离卷积神经网络

微分进化

机器学习中的探索与利用

机器学习在设计中的应用

ML 中的 LOOCV (留一法交叉验证)

半监督学习

机器学习中的持续学习

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器