使用迁移学习进行狗品种分类

2025年7月22日 | 17分钟阅读

在本教程中，我们将学习如何使用Python中的迁移学习对各种狗品种进行分类。

什么是CNN？

卷积神经网络（CNN）是一种深度学习算法，特别适用于图像识别和处理任务。它由多个层组成，包括卷积层、池化层和全连接层。卷积层是CNN的关键部分，其中滤波器应用于输入图像以提取边缘、纹理和形状等特征。卷积层的输出随后通过池化层，池化层用于对特征图进行下采样，在保留主要信息的同时减少空间维度。池化层的输出随后通过一个或多个全连接层，用于进行预测或对图像进行分类。

CNN使用大型标记图像数据集进行训练，其中网络学习识别与特定对象或类别相关的模式和特征。一旦训练完成，CNN可用于分类新图像，或提取特征用于对象检测或图像分割等其他应用。CNN在广泛的图像识别任务中取得了最先进的性能，包括对象分类、对象检测和图像分割。它们广泛应用于计算机视觉、图像处理和其他相关领域，并已应用于许多应用，包括自动驾驶汽车、医学成像和安全系统。

卷积神经网络（CNN）是一种深度学习神经网络，旨在处理结构化数据，如图像。
CNN非常擅长捕捉输入图像中的图案，例如线条、梯度、圆形，甚至眼睛和面部。
这一特性使得卷积神经网络在计算机视觉领域如此强大。
CNN可以直接在原始图像上运行，无需任何预处理。
卷积神经网络是一种前馈神经网络，有时多达20层。
卷积神经网络的强大之处在于一种特殊类型的层，称为卷积层。
CNN包含许多相互堆叠的卷积层，每一层都能够识别更精细的形状。
通过三到四个卷积层可以识别手写数字，而通过25层则可以识别人脸。
这个循环的计划是激活机器像人类一样看世界，以相同的风格看世界，甚至将这些信息用于大量的任务，例如图像和视频识别、图像分析和分类、媒体娱乐、推荐系统、自然语言处理等。

卷积神经网络设计

卷积神经网络的构建是一个多样化的前馈神经网络，通过以特定顺序将多个隐藏层相互堆叠而成。
这种顺序设计使得CNN能够学习分层属性。
在CNN中，其中一些后面跟着聚合层，而隐藏层通常是卷积层，后面跟着激活层。
ConvNet所需的预处理与人脑中连接的神经元模式类似，并受到视觉皮层组织的启发。

Dog Breed Classification using Transfer Learning

CNN如何工作

一个CNN可以有多个层，每一层都学习识别输入图像的不同特征。每个图像都会应用一个滤波器或核，以生成一个在每一层之后变得更加增强和详细的输出。在较低层中，滤波器可以从简单的特征开始。在每个连续层中，滤波器的复杂性增加，以检查和识别独特地表示输入对象的特征。因此，每个卷积图像的输出——每层之后部分识别的图像——成为下一层的输入。在最后一层，即FC层，CNN识别图像或它所代表的对象。

通过卷积，输入图像通过一组这些滤波器。当每个滤波器激活图像中的特定特征时，它会完成其工作并将输出提供给下一层中的滤波器。每个层学习识别不同的特征，并且这些操作重复数十、数百甚至数千层。最后，所有通过CNN多层处理的图像数据允许CNN识别整个对象。对于图像识别、图像分类和计算机视觉（CV）应用，CNN特别有用，因为它们提供高度准确的结果，尤其是在涉及大量数据时。

迁移学习：什么是M、MS、ms？？？？

人工智能的迁移学习是指在一个新的AI模型中重复利用预训练模型中的组件。如果两个模型旨在执行相似的任务，则可以在它们之间共享概括性知识。这种AI开发方法减少了训练新模型所需的资源和标记数据量。它正成为AI发展的重要组成部分，并越来越多地被用作开发过程中的一种方法。

人工智能正成为现代世界不可或缺的一部分。人工智能算法被用于在一系列行业中完成复杂的任务。模型包括改进营销活动以获得更好的投资回报、提高公司效率以及推动语音识别软件的发展。迁移学习将在这些模型的持续发展中发挥重要作用。有多种不同类型的人工智能，但最常见的流程之一是监督式人工智能。这种人工智能使用标记训练数据来训练模型。正确标记数据集需要专业知识，并且训练机器的过程通常资源密集且耗时。

当训练一个系统来解决一个新任务需要大量资源时，通常会使用人工智能的迁移学习。该过程会提取现有AI模型的重要部分，并将其应用于解决一个新的但相似的问题。迁移学习的关键部分是泛化。这意味着只有可以在不同情况或条件下被另一个模型使用的信息才会被迁移。与模型严格绑定到训练数据集不同，迁移学习中使用的模型将更具泛化性。以这种方式开发的模型可以在不断变化的环境和不同的数据集下使用。

一个模型是利用迁移学习进行图像分类。一个AI模型可以用标记数据进行训练，以识别和分类图像的主题。然后，该模型可以通过迁移学习进行调整和重用，以识别一组图像中另一个特定主题。该模型的通用组件将保持不变，从而节省资源。这可能是模型中识别图像中物体边缘的部分。这种知识的转移节省了重新训练另一个模型以达到相同结果的时间。

迁移学习通常用于

节省从头开始训练多个AI模型以完成类似任务所需的时间和资源。
作为人工智能领域（如图像分类或自然语言处理）中需要大量资源的一种效率节约。
通过使用预训练模型来弥补组织缺乏标记训练数据的问题。

将其缩短至不超过2段

解释您将使用的库

如何预测狗的品种

在本教程中，我们将学习如何使用CNN和迁移学习预测狗的品种。在本教程中，我们将遵循以下步骤：

阶段0：导入数据集

阶段1：识别人类

阶段2：识别狗

阶段3：创建一个CNN来对狗品种进行排序（从头开始）

阶段4：使用CNN来分类狗品种（使用迁移学习）

阶段5：创建一个CNN来对狗品种进行排序（使用迁移学习）

阶段6：编写您的算法

阶段7：测试您的算法

阶段0：导入数据集

我们导入狗图片的D数据集，用于进一步建模。我们使用scikit-learn库中的l_files功能填充一些变量。

导入狗数据

t_files, v_files, t_files - 包含图像文件路径的NumPy数组
t_targets, v_targets, t_targets - 包含one-hot编码分类标签的NumPy数组
dog_names - 用于解释标签的字符串值狗品种名称列表

from sklearn. datasets import l_files
# here, we are importing the load files from the sklearn packages into our program       
from keras.utils import np_utils
# here, we are importing the np utils from the package keras into our program
import numpy as np
# here, we are importing the numpy library as np into our program 
from glob import glob
# here, we are importing the glob library as glob into our program 
# here, we are defining the function to load train, test, and validation datasets
def l_dataset(path):
    data = l_files(path)
    dog_files = np.array(data['filenames'])
    dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
    return dog_files, dog_targets
# here, we are loading, training, testing, and validation datasets
t_files, t_targets = l_dataset('../../../data/dog_images/train')
v_files, v_targets = l_dataset('../../../data/dog_images/valid')
t_files, t_targets = l_dataset('../../../data/dog_images/test')
# here, we are loading the list of dog names
dog_names = [item[20:-1] for item in sorted(glob("../../../data/dog_images/train/*/"))]
# here, we are printing the statistics about the dataset
print('There are %d number of total dog categories.' % len(dog_names))
# here, we are printing the total number of dog categories present in the dataset
print('There are %s number of total dog images.\n' % len(np.hstack([t_files, v_files, t_files])))
# here, we are printing the total number of dog images present in the dataset
print('There are %d number of training dog images.' % len(t_files))
# here, we are printing the total number of training dog images present in the dataset
print('There are %d number of validation dog images.' % len(v_files))
# here, we are printing the number of validation dog images present in the dataset
print('There are %d number of test dog images.'% len(t_files))
# here, we are printing the total number of test dog images present in the dataset
Output???? All the prints???

导入人类数据

import random
# Here, we are importing the random library into our program
random.seed(8645310)
# here, we are loading filenames in the shuffled human dataset
h_files = np.array(glob("../../../data/lfw/*/*"))
random.shuffle(h_files)
# Here, we are printing the statistics about the dataset
print('There are %d total number of human images.' % len(h_files))

阶段1：识别人类

我们使用OpenCV基于Haar特征的级联分类器来识别图像中的人脸。OpenCV提供了许多预训练的人脸检测器，作为XML文件存储在github上。我们已经下载了其中一个检测器并将其存储在haarcascades目录中。

import cv2          
# here, we are importing the numpy library as np into our program       
import matplotlib.pyplot as plt                        
# here, we are importing the pyplot library from matplotlib package as plt into our     # program       
%matplotlib inline                               
# here, we are extracting the pre-trained face detector for detection purpose
f_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')
# here, we are loading the color (BGR) image
img = cv2.imread(h_files[3])
# here, we are converting the BGR image to grayscale
g = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# here, we are finding the faces in image
f = f_cascade.detectMultiScale(g)
# print number of faces detected in the image
print('Number of faces detected:', len(f))
# here, we are getting the bounding box for each detected face from the image
for (x,y,w,h) in f:
    # here, we are adding the bounding box to color image
    cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
# here, we are converting the BGR image to RGB for plotting
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# here, we are displaying the image, along with bounding box
plt.imshow(cv_rgb)
plt.show()       # here, we are showing the plotted graph using matplotlib package
output????

在使用任何面部识别器之前，标准程序是将图像转换为灰度。detect Multi Scale功能实现存储在f_cascade中的分类器，并将灰度图像作为参数。

在上述代码中，faces是一个检测到的人脸的NumPy数组，其中每行对应一个检测到的人脸。每个检测到的人脸都是一个具有四个部分的1D数组，指示检测到的人脸的边界框。数组中的前两个部分（在上述代码中分离为x和y）指定边界框左上角的水平和垂直位置。数组中的最后两个部分（此处分离为w和h）指定框的宽度和高度。

编写人脸检测器代码

# here, we are going to return "True" if face is detected in image stored at img_path
def f_detector(img_path):
    im = cv2.imread(img_path)
    g = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    f = f_cascade.detectMultiScale(g)
    return len(f) > 0

阶段2：识别狗

在本节中，我们使用预训练的ResNet-50模型来检测图像中的狗。我们的第一行代码下载了ResNet-50模型，以及在ImageNet上训练过的权重。ImageNet是一个非常大、非常流行的用于图像分类和其他视觉任务的数据集。ImageNet包含超过1000万个URL，每个URL都链接到一个包含1000个类别之一的对象的图像。给定一张图片，这个预训练的ResNet-50模型会返回一个预测（从ImageNet中的可用类别中获取）图像中包含的对象。

from keras.applications.resnet50 import ResNet50
# here, we are importing the ResNet50 library from keras package into our program
# here, we are characterizing the ResNet50 m
ResNet50_m = ResNet50(weights='imagenet')

预处理信息

当使用TensorFlow作为后端时，Keras CNNs需要一个形状为的4D数组（我们也将称之为4D张量）作为输入，其中nb_samples对应于图像（或样本）的总数，rows、columns和channels分别对应于每张图像的行数、列数和通道数。

下面的path_to_tensor函数将彩色图像的字符串值文件路径作为输入，并返回一个适合提供给Keras CNN的4D张量。该函数首先加载图像并将其大小调整为像素的正方形图像。然后，将图像转换为数组，然后将其大小调整为4D张量。在这种情况下，由于我们处理的是彩色图像，因此每张图像有三个通道。同样，由于我们处理的是单张图像（或样本），因此返回的张量始终具有形状。

paths_to_tensor函数将字符串值图像路径的NumPy数组作为输入，并返回形状为的4D张量。这里，nb_samples是提供的图像路径数组中的样本数或图像数。最好将nb_samples视为数据集中3D张量（其中每个3D张量对应于不同的图像）的数量！

from tqdm import tqdm
# here, we are importing the tqdm library from tqdm package into our program
def path_to_tensor(img_path):
    # here, we are going to load RGB image as PIL.Image.Image type
    im = image.load_img(img_path, t_size=(224, 224))
# here, we are converting PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
    x = image.img_to_array(img)
# here, we are converting the 3D tensor to 4D tensor with shape (1, 224, 224, 3) and 
# return 4D tensor
    return np.expand_dims(x, axis=0)
def paths_to_tensor(img_paths):
    l_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
    return np.vstack(l_of_tensors)

使用ResNet-50进行预测

为ResNet-50以及Keras中的任何其他预训练模型准备4D张量需要一些额外的处理。首先，通过重新排列通道将RGB图像转换为BGR。所有预训练模型都有一个额外的标准化步骤，即必须从每张图像的每个像素中减去平均像素（在RGB中表示为，并从ImageNet中所有图像的所有像素中计算得出）。这在导入的p_input函数中实现。如果您有兴趣，可以查看p_input的代码。

既然我们有了为ResNet-50设计图像的方法，我们现在就可以使用该模型提取预测。这通过预测方法实现，该方法返回一个数组，其

第n个条目是模型预测图像属于第
个ImageNet分类的概率。这在下面的ResNet50_predict_labels函数中实现。

通过取预测概率向量的argmax，我们得到一个整数，对应于模型预测的对象类别，我们可以使用此词典将其与对象类别关联。

from keras.applications.resnet50 import p_input, dc_predictions
# here, we are importing the p_input library from keras package into our program
def ResNet50_predict_labels(img_path):
    # here, we are returning expectation vector for picture situated at img_path
    im = p_input(path_to_tensor(img_path))
    return np.argmax(ResNet50_m.predict(im))

编写狗检测器

查看词典，你会发现与狗对应的类别以连续序列出现，并对应于词典键151-268（包含），包括从“吉娃娃”到“墨西哥无毛犬”的所有类别。因此，要验证预训练的ResNet-50模型是否预测图像包含狗，我们只需要检查ResNet50_predict_labels函数是否返回介于151和268（包含）之间的值。

我们利用这些计划完成下面的dog_detector函数，如果图片中检测到狗（否则为False），则返回“有效”。

# here, we are going to return "Valid" on the off chance that a canine is recognized in # the picture put away at img_path
def dog_detector(img_path):
    expectation = ResNet50_predict_labels(img_path)
    return ((expectation <= 268) and (forecast >= 151))

阶段3：创建一个CNN来对狗品种进行分类（从头开始）

既然我们有了在图片中识别人和狗的能力，我们需要一种从图片中预测品种的方法。在此步骤中，您将创建一个分类狗品种的CNN。您必须从头开始创建您的CNN（因此，您还不能使用迁移学习！），并且您必须达到至少1%的测试精度。在本笔记的同步5中，您将有机会使用迁移学习来创建一个达到显著更高精度的CNN。请注意不要添加过多的可训练层！更多的边界意味着更长的训练时间，这意味着您更可能需要GPU来加快训练过程。幸运的是，Keras提供了对每个时代可能需要的时间的有用估计；您可以推断出此估计值以确定您的算法需要多长时间才能训练。

我们注意到，从图片中为狗分配品种的任务被认为特别具有挑战性。为了理解原因，请考虑即使人类也很难区分布列塔尼犬和威尔士史宾格犬。

很容易找到其他狗品种对，它们之间几乎没有类别差异（例如，卷毛寻回犬和美国水猎犬）。同样，请记住拉布拉多犬有黄色、巧克力色和黑色。您的基于视觉的算法必须克服这种高类内差异，才能确定如何将这些不同的色调归类为同一品种。

我们还注意到，随机机会呈现了一个非常低的门槛：撇开类别有些不平衡的事实，随机猜测大约每多次中会给出一次正确答案，这对应于不到1%的准确率。

预处理数据

from PIL import IFile   
# here, we are importing the PIL module from the IFile                         
IFile.LOAD_TRUNCATED_IMAGES = True                 
# here, we are going to pre-process the data for Keras
t_tensors = paths_to_tensor(t_files).astype('float32')/245
v_tensors = paths_to_tensor(v_files).astype('float32')/245
t_tensors = paths_to_tensor(t_files).astype('float32')/245

模型架构

from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
# here, we are importing the Conv2D, MaxPooling2D, GlobalAveragePooling2D library # from keras package into our program
from keras.layers import Dropout, Flatten, Dense
# here, we are importing the Dropout, Flatten, Dense library from keras package into # our program
from keras.ms import Sequential
# Here, we are importing the Sequential library from keras package into our program
# here, we are creating the architecture using Sequential() method
m = Sequential()
# here, we are declaring a convolution layer to separate highlights from the info         # picture, first CONV layer has 16 channels of size 3x3 since this is the principal layer  # we should enter the aspect shape which is a 224 x 224 pixel picture with 
# profundity = 3 (RGB).
m.add(Conv2D(16, (3, 3), activation='relu', in_shape=(224,224,3)))
# here, the following layer will be a pooling layer with a 2 x 2 pixel channel to get the          # maximum component from the element maps. This lessens the component of the       # element maps significantly and is otherwise called sub examining. Dynamically        # decrease spatial size (width and level) of info
m.add(MaxPooling2D(p_size=(2, 2)))
# here, we are going to make another convolution layer and pooling layer like              # previously, however without the in_shape increment absolute number of channels    # learned
m.add(Conv2D(32, (3, 3), activation='relu'))
m.add(MaxPooling2D(p_size=(2,2)))
# here, we are going to make another convolution layer and pooling layer like              # previously, however without the in_shape increment all out number of channels      # learned
m.add(Conv2D(64, (3, 3), activation='relu'))
m.add(MaxPooling2D(p_size=(2, 2)))
m.add(GlobalAveragePooling2D())
m.add(Dense(133, activation='relu'))
# here, we are defining our architecture.
m.summary()

如何编译模型

语法

使用以下语法我们可以编译模型架构

m.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

如何训练模型

from keras.callbacks import ModelCheckpoint  
# Here, we are importing the ModelCheckpoint library from keras package into our     # program
# here, we are specifying the number of epochs that you would like to use to train     # the model.
epochs = 50
# here, we should take care that we cannot modify the code below this line.
Cpointer = modelCheckpoint(filepath='saved_models/weights.best.from_scratch.hdf5', 
                               verbose=1, save_best_only=True)
m.fit(t_tensors, t_targets, 
          v_data=(v_tensors, v_targets),
          epochs=epochs, b_size=20, callbacks=[cpointer], verbose=1)

如何测试模型

我们应该测试我们的模型以了解模型的输出

# here, we are getting the file of anticipated canine variety for each picture in test set
d_br_predictions = [np.argmax(m.predict(np.expand_dims(tensor, axis=0))) for tensor in t_tensors]
# here, we are reporting the test precision
t_accuracy = 100*np.sum(np.array(dog_breed_predictions)==np.argmax(test_targets, axis=1))/len(d_br_predictions)
print('Test precision: %.4f%%' % t_accuracy)
Output???

用模型预测狗的品种

from extract_bottleneck_features import *
# Here, we are importing all the modules or features from the package                         
# extract_bottleneck_features
def VGG16_predict_breed(img_path):
    # Here, we are going to separate bottleneck highlights
    bottleneck_feature = extract_VGG16(path_to_tensor(img_path))
    # Here, we are getting acquired anticipated vector
    pr_vector = VGG16_m.predict(bottleneck_feature)
    # Here, we are going to return canine variety that is anticipated by the model
    return d_names[np.argmax(pr_vector)]
Output??

步骤5：使用迁移学习创建一个CNN来对狗品种进行排序

在步骤4中，我们利用迁移学习创建了一个使用VGG-16瓶颈特征的CNN。在本节中，您应该使用来自不同预训练模型的瓶颈特征。为了让您更轻松，我们已经预处理了Keras中所有可用网络的特征

VGG-19瓶颈特征
ResNet-50瓶颈特征
Inception瓶颈特征
Xception瓶颈特征

只需遵循与步骤4类似的步骤，然后我们的最终测试准确率为80.5024%。

步骤6：编写您的算法

编写一个算法，接受图像的文件路径，并首先确定图像中是包含人、狗，还是两者都不是。然后，如果图像中检测到狗，则返回预测的品种。如果图像中检测到人，则返回相似的狗品种。

如果两者都未在图像中检测到，则输出指示错误。

%pylab inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
# Here, we are composing our algorithm.
# Here, we are going ahead and use however many code cells depending on the 
# Situation
def im_detecter(img_path):
    img = mpimg.imread(img_path)
    imgplot = plt.imshow(img)
    plt.show()
    if d_detector(img_path) == Valid:
        d_name = Resnet50_predict_breed(img_path)
        print("a canine face is recognized in the picture and the anticipated variety is {}".format(d_name.split(".")[1]))
    elif f_detector(img_path) == Valid:
        res_breed = Resnet50_predict_breed(img_path)
        print("a human face is recognized in the picture and the anticipated looking like variety is {}".format(res_breed.split(".")[1]))
    else:
        print("there is a blunder or an error")
Output?

阶段7：测试您的计算

import glob
# Here, we are importing the glob module into our program
for fpath in glob.iglob('images_for_step7/*.jpg'):
    im_detecter(fpath)

结果

（显示结果）

最终结果看起来相当准确，我们可以看到狗的品种预测是正确的，而人类相似的品种也是合理的。迁移学习比我从头开始构建的CNN模型效果好得多。这主要是因为迁移学习模型是在大量数据上训练的，因此架构已经理解了哪种特征对图像最有代表性，这使得分类过程更加容易，而且即使我们没有太多数据，也不必牺牲准确性。

下一个主题如何在PyTorch中获取模型摘要

使用迁移学习进行狗品种分类

什么是CNN？

卷积神经网络设计

CNN如何工作

迁移学习：什么是M、MS、ms？？？？

如何预测狗的品种

阶段0：导入数据集

阶段1：识别人类

编写人脸检测器代码

阶段2：识别狗

预处理信息

使用ResNet-50进行预测

编写狗检测器

阶段3：创建一个CNN来对狗品种进行分类（从头开始）

预处理数据

模型架构

如何编译模型

语法

如何训练模型

如何测试模型

用模型预测狗的品种

步骤5：使用迁移学习创建一个CNN来对狗品种进行排序

步骤6：编写您的算法

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

机器学习

监督式学习

分类

杂项

相关教程

面试题

使用迁移学习进行狗品种分类

什么是CNN？

卷积神经网络设计

CNN如何工作

迁移学习：什么是M、MS、ms？？？？

如何预测狗的品种

阶段0：导入数据集

阶段1：识别人类

编写人脸检测器代码

阶段2：识别狗

预处理信息

使用ResNet-50进行预测

编写狗检测器

阶段3：创建一个CNN来对狗品种进行分类（从头开始）

预处理数据

模型架构

如何编译模型

语法

如何训练模型

如何测试模型

用模型预测狗的品种

步骤5：使用迁移学习创建一个CNN来对狗品种进行排序

步骤6：编写您的算法

相关帖子

机器学习中的定向广告

机器学习中的蛋白质折叠

机器学习的贝叶斯超参数优化

使用 Python 和 Pandas 访问 SQLite 数据库

贝叶斯深度学习：神经网络中的不确定性量化

用于机器学习分类的共形预测

机器学习中的损失函数

神经网络中 Batch 和 Epoch 的区别

机器学习必备数学知识

混淆矩阵

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器