风格迁移的工作原理

2025年3月17日 | 阅读 7 分钟

神经风格迁移是用于获取两张图像——内容图像和风格参考图像并将它们混合的技术，因此输出图像看起来像内容图像，但以风格参考图像的风格“绘制”。

导入并配置模块

打开 Google colab

from __future__ import absolute_import, division, print_function, unicode_literals

try:
# %tensorflow_version only exists in Colab.
%tensorflow_version 2.x
except Exception:
pass
import tensorflow as tf

输出

TensorFlow 2.x selected.

import IPython.display as display
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (12,12)
mpl.rcParams['axes.grid'] = False
import numpy as np
import time
import functools

content_path = tf.keras.utils.get_file('nature.jpg','https://www.eadegallery.co.nz/wp-content/uploads/2019/03/626a6823-af82-432a-8d3d-d8295b1a9aed-l.jpg')
style_path = tf.keras.utils.get_file('cloud.jpg','https://i.pinimg.com/originals/11/91/4f/11914f29c6d3e9828cc5f5c2fd64cfdc.jpg')

输出

Downloading data from https://www.eadegallery.co.nz/wp-content/uploads/2019/03/626a6823-af82-432a-8d3d-d8295b1a9aed-l.jpg
1122304/1117520 [==============================] - 1s 1us/step
Downloading data from https://i.pinimg.com/originals/11/91/4f/11914f29c6d3e9828cc5f5c2fd64cfdc.jpg
      49152/43511 [=================================] - 0s 0us/step5. def

将最大测量值检查到 512 像素。

load_img(path_to_img):
max_dim = 512
img = tf.io.read_file(path_to_img)
img = tf.image.decode_image(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
shape = tf.cast(tf.shape(img)[:-1], tf.float32)
long_dim = max(shape)
scale = max_dim / long_dim
new_shape = tf.cast(shape * scale, tf.int32)
img = tf.image.resize(img, new_shape)
img = img[tf.newaxis, :]
return img

创建一个函数来显示图像

def imshow(image, title=None):
 if len(image.shape) > 3:
 image = tf.squeeze(image, axis=0)

plt.imshow(image)
if title:
plt.title(title)

content_image = load_img(content_path)
style_image = load_img(style_path)
plt.subplot(1, 2, 1)
imshow(content_image, 'Content Image')
plt.subplot(1, 2, 2)
imshow(style_image, 'Style Image')

输出

x = tf.keras.applications.vgg19.preprocess_input(content_image*255)
x = tf.image.resize(x, (224, 224))
vgg = tf.keras.applications.VGG19(include_top=True, weights='imagenet')
prediction_probabilities = vgg(x)
prediction_probabilities.shape

输出

Downloading data from https://github.com/fchollet/deep-learning-    models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels.h5

574717952/574710816 [==============================] - 8s 0us/step
TensorShape([1, 1000])

predicted_top_5 = tf.keras.applications.vgg19.decode_predictions(prediction_probabilities.numpy())[0]
[(class_name, prob) for (number, class_name, prob) in predicted_top_5]

输出

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json
40960/35363 [==================================] - 0s 0us/step
[('mobile_home', 0.7314594),
 ('picket_fence', 0.119986326),
 ('greenhouse', 0.026051044),
 ('thatch', 0.023595566),
 ('boathouse', 0.014751049)]

定义风格和内容表示

使用模型的中间层来表示图像的内容和风格。从输入层开始，前几层的激活表示低级表示，例如边缘和纹理。

对于输入图像，尝试匹配中间层中相似的风格和内容目标表示。

加载 VGG19 并在我们的图像上运行它，以确保它在这里被正确使用。

vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
print()
for layer in vgg.layers:
print(layer.name)

输出

Download data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5
80142336/80134624 [==============================] - 2s 0us/step

input_2
block1_conv1
block1_conv2
block1_pool
block2_conv1
block2_conv2
block2_pool
block3_conv1
block3_conv2
block3_conv3
block3_conv4
block3_pool
block4_conv1
block4_conv2
block4_conv3
block4_conv4
block4_pool
block5_conv1
block5_conv2
block5_conv3
block5_conv4
block5_pool

# Content layer
content_layers = ['block5_conv2'] 

# Style layer of interest
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1', 
                'block4_conv1', 
                'block5_conv1']

num_content_layers = len(content_layers)
num_style_layers = len(style_layers)

风格和内容的中间层

从高级来看，一个用于执行图像分类的网络了解图像，并且需要将图像作为像素并构建一个内部插图，该插图将原始图像像素转换为图像中存在的复杂特征。

这也是卷积神经网络能够很好地泛化的原因：它们可以捕获类中不同的和定义特征（例如，猫 vs. 狗），这些特征与图像馈入模型和输出排列标签的方式无关，模型将其作为复杂的特征提取器交付。通过访问模型的中间层，我们可以描述输入图像的风格和内容。

构建模型

在 tf.keras.applications 中定义了网络，因此我们可以使用 Keras 功能 API 轻松提取中间层的值。

要使用功能 API 定义任何模型，请指定输入和输出

model= Model(inputs, outputs)

给定的函数构建一个 VGG19 模型，该模型返回中间层列表。

def vgg_layers(layer_names):
""" Creating a vgg model that returns a list of intermediate output values."""
# Load our model. Load pretrained VGG, trained on imagenet data
vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
vgg.trainable = False
outputs = [vgg.get_layer(name).output for name in layer_names]
model = tf.keras.Model([vgg.input], outputs)
return model

style_extractor = vgg_layers(style_layers)
style_outputs = style_extractor(style_image*255)
#Look at the statistics of each layer's output
for name, output in zip(style_layers, style_outputs):
  print(name)
  print("  shape: ", output.numpy().shape)
  print("  min: ", output.numpy().min())
  print("  max: ", output.numpy().max())
  print("  mean: ", output.numpy().mean())
  print()

输出

block1_conv1
  shape:  (1, 427, 512, 64)
  min:  0.0
  max:  763.51953
  mean:  25.987665

block2_conv1
  shape:  (1, 213, 256, 128)
  min:  0.0
  max:  3484.3037
  mean:  134.27835

block3_conv1
  shape:  (1, 106, 128, 256)
  min:  0.0
  max:  7291.078
  mean:  143.77878

block4_conv1
  shape:  (1, 53, 64, 512)
  min:  0.0
  max:  13492.799
  mean:  530.00244

block5_conv1
  shape:  (1, 26, 32, 512)
  min:  0.0
  max:  2881.529
  mean:  40.596397

Gram 矩阵

计算风格

图像的内容由地图的常见特征的值表示。

计算一个 Gram 矩阵，它通过对所有位置进行输出乘积来包含此信息。

Gram 矩阵可以为特定层计算为

这使用 tf.linalg.einsum 函数简洁地实现

def gram_matrix(input_tensor):
result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
input_shape = tf.shape(input_tensor)
  num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
  return result/(num_locations)

提取图像的风格和内容

构建返回内容和风格张量的模型。

class StyleContentModel(tf.keras.models.Model):
def __init__(self, style_layers, content_layers):
super(StyleContentModel, self).__init__()
self.vgg =  vgg_layers(style_layers + content_layers)
self.style_layers = style_layers
self.content_layers = content_layers
self.num_style_layers = len(style_layers)
self.vgg.trainable = False
def call(self, inputs):
"Expects float input in [0,1]"
    inputs = inputs*255.0
    preprocessed_input = tf.keras.applications.vgg19.preprocess_input(inputs)
    outputs = self.vgg(preprocessed_input)
    style_outputs, content_outputs = (outputs[:self.num_style_layers],outputs[self.num_style_layers:])
style_outputs = [gram_matrix(style_output)
 for style_output in style_outputs]

    content_dict = {content_name:value for content_name, value in zip(self.content_layers, content_outputs)}
style_dict = {style_name:value
                  for style_name, value
                  in zip(self.style_layers, style_outputs)}
return {'content':content_dict, 'style':style_dict}

当在图像上调用时，此模型返回 style_layers 的 Gram 矩阵（风格）和 content_layers 的内容

extractor = StyleContentModel(style_layers, content_layers)
results = extractor(tf.constant(content_image))
style_results = results['style']
print('Styles:')
for name, output in sorted(results['style'].items()):
  print("  ", name)
  print("    shape: ", output.numpy().shape)
  print("    min: ", output.numpy().min())
  print("    max: ", output.numpy().max())
  print("    mean: ", output.numpy().mean())
  print()
print("Contents:")
for name, output in sorted(results['content'].items()):
  print("  ", name)
  print("    shape: ", output.numpy().shape)
  print("    min: ", output.numpy().min())
  print("    max: ", output.numpy().max())
  print("    mean: ", output.numpy().mean())

输出

Styles:
   block1_conv1
    shape:  (1, 64, 64)
    min:  0.0055228453
    max:  28014.557
    mean:  263.79025

   block2_conv1
    shape:  (1, 128, 128)
    min:  0.0
    max:  61479.496
    mean:  9100.949

   block3_conv1
    shape:  (1, 256, 256)
    min:  0.0
    max:  545623.44
    mean:  7660.976

   block4_conv1
    shape:  (1, 512, 512)
    min:  0.0
    max:  4320502.0
    mean:  134288.84

   block5_conv1
    shape:  (1, 512, 512)
    min:  0.0
    max:  110005.37
    mean:  1487.0381

Contents:
   block5_conv2
    shape:  (1, 26, 32, 512)
    min:  0.0
    max:  2410.8796
    mean:  13.764149

运行梯度下降

使用此风格和内容提取器，我们实现风格迁移算法。通过评估我们的图像输出相对于每个目标的均方误差来执行此操作，然后获取损失的加权和。

设置我们的风格和内容目标值

style_targets = extractor(style_image)['style']
content_targets = extractor(content_image)['content']

定义一个 tf.Variable 来包含要保存的图像。借助内容图像（tf.Variable 的形状与内容图像相同）对其进行初始化

这是一个浮动图像，定义一个将像素值保持在 0 和 1 之间的函数

def clip_0_1(image):
return tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)

创建优化器。本文推荐 LBFGS

要优化它，请使用两个损失的权重组合来获得总损失

style_weight=1e-2
content_weight=1e4

def style_content_loss(outputs):
    style_outputs = outputs['style']
    content_outputs = outputs['content']
    style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_targets[name])**2) 
                           for name in style_outputs.keys()])
    style_loss *= style_weight / num_style_layers

    content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_targets[name])**2) 
for name in content_outputs.keys()])
    content_loss *= content_weight / num_content_layers
    loss = style_loss + content_loss
    return loss

使用函数 tf.GradientTape 来更新图像。

@tf.function()
def train_step(image):
  with tf.GradientTape() as tape:
 outputs = extractor(image)
 loss = style_content_loss(outputs)
grad = tape.gradient(loss, image)
opt.apply_gradients([(grad, image)])
  image.assign(clip_0_1(image))

运行以下步骤进行测试

train_step (image)
train_step (image)
train_step (image)
plt.imshow(image.read_value()[0])

输出

转换图像

在此步骤中执行更长的优化

import time
start = time.time()

epochs = 10
steps_per_epoch = 100
step = 0
for n in range(epochs):
  for m in range(steps_per_epoch):
    step += 1
    train_step(image)
    print(".", end='')
  display.clear_output(wait=True)
  imshow(image.read_value())
  plt.title("Train step: {}".format(step))
plt.show()
end = time.time()
print("Total time: {:.1f}".format(end-start))

输出

总变异损失

def high_pass_x_y(image):
  x_var = image[:,:,1:,:] - image[:,:,:-1,:]
  y_var = image[:,1:,:,:] - image[:,:-1,:,:]
   return x_var, y_var

x_deltas, y_deltas = high_pass_x_y(content_image)
plt.figure(figsize=(14,10))
plt.subplot(2,2,1)
imshow(clip_0_1(2*y_deltas+0.5), "Horizontal Deltas: Original")
plt.subplot(2,2,2)
imshow(clip_0_1(2*x_deltas+0.5), "Vertical Deltas: Original")
x_deltas, y_deltas = high_pass_x_y(image)
plt.subplot(2,2,3)
imshow(clip_0_1(2*y_deltas+0.5), "Horizontal Deltas: Styled")
plt.subplot(2,2,4)
imshow(clip_0_1(2*x_deltas+0.5), "Vertical Deltas: Styled")

输出

这显示了高频分量如何增加。

此高频分量是一个边缘检测器。我们从给定的示例中得到来自边缘检测器的相同输出

plt.figure(figsize=(14,10))
sobel = tf.image.sobel_edges(content_image)
plt.subplot(1,2,1)
imshow(clip_0_1(sobel [...,0]/4+0.5), "Horizontal Sobel-edges")
plt.subplot(1,2,2)
imshow(clip_0_1(sobel[...,1]/4+0.5), "Vertical Sobel-edges")

输出

与此相关的正则化损失是值的平方和

def total_variation_loss(image):
  x_deltas, y_deltas = high_pass_x_y(image)
  return tf.reduce_sum(tf.abs(x_deltas)) + tf.reduce_sum(tf.abs(y_deltas))

输出

99172.59

这演示了它的作用。但无需自己实现它，它包含一个标准实现

输出

array([99172.59], dtype=float32)

重新运行优化函数

选择函数 total_variation_loss 的权重

现在，train_step 函数

@tf.function()
def train_step(image):
with tf.GradientTape() as tape:
outputs = extractor(image)
loss = style_content_loss(outputs)
loss += total_variation_weight*tf.image.total_variation(image)
grad = tape.gradient(loss, image)
opt.apply_gradients([(grad, image)])
image.assign(clip_0_1(image))

重新初始化优化变量

并运行优化

import time
start = time.time()

epochs = 10
steps_per_epoch = 100

step = 0
for n in range(epochs):
  for m in range(steps_per_epoch):
    step += 1
    train_step(image)
print(".", end='')
  display.clear_output(wait=True)
  display.display(tensor_to_image(image))
  print("Train step: {}".format(step))
end = time.time()
print("Total time: {:.1f}".format(end-start))

输出

最后保存结果

file_name = 'styletransfer.png'
tensor_to_image(image). save(file_name)
try: from google. colab import files
except ImportError:
pass
else:
  files.download(file_name)

下一主题TensorBoard

← 上一话题下一话题 →

我们提供所有技术（如 Java 教程、Android、Java 框架）的教程和面试问题

联系信息

G-13, 2nd Floor, Sec-3, Noida, UP, 201301, India

hr@tpointtech.com

+91-9599086977

关注我们

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

TensorFlow 教程

TensorFlow 基础

TensorFlow 感知器

TensorFlow 中的 ANN

线性回归

TensorFlow 中的 CNN

TensorFlow 中的 RNN

风格迁移

TensorBoard

差异

目标检测

TensorFlow 调试

其他主题