Gram矩阵

2025年3月17日 | 阅读 3 分钟

格莱姆矩阵源于有限维空间中的一个函数；格莱姆矩阵的条目是有限维子空间基本服务的内积。我们必须计算风格损失。但我们还没有被展示"为什么风格损失使用格莱姆矩阵计算。" 格莱姆矩阵 捕捉给定层中一组特征图的"特征分布" 。

注意：我们认为上述问题没有得到令人满意的解答。例如，让我们尝试更直观地解释它。假设我们有以下特征图。为了简单起见，我们只考虑三个特征图，其中两个是完全被动的。我们有一个特征图集，其中第一个特征图看起来像一幅自然图片，在第二个特征图中，第一个特征图看起来像一片乌云。然后，如果我们尝试手动计算内容和风格损失，我们将得到这些值。

这意味着我们没有在两个特征图集之间丢失风格信息。然而，内容是不同的。

理解风格损失

最终损失

它定义为，

其中 α 和 β 是用户定义的超参数。这里 β 吸收了先前定义的 M^l 归一化因子。通过控制 α 和 β，我们可以控制注入到生成的图像中的内容和风格的量。我们还可以在论文中看到不同 α 和 β 值的不同效果的漂亮可视化。

定义优化器

接下来，我们使用 Adam 优化器来优化网络的损失。

定义输入管道

这里我们描述完整的输入管道。 tf.data 提供了一个非常易于使用和直观的界面来实现输入管道。对于大多数图像处理任务，我们使用 tf. Image api，尽管 tf.image 动态调整图像大小的能力很小。

例如，如果我们想动态地裁剪和调整图像大小，最好以生成器的形式进行，如下所示。

我们定义了两个输入管道；一个用于风格，一个用于内容。内容输入管道只查找以单词 content_ 开头的 jpg 图像，而风格管道查找以 style_ 开头的模型。

	def image_gen_function(data_dir, file_match_str, do_shuffle=True):    
"""
"	The function returns a produced image, and the color channel is like values. 
	This is a generator function that is used by the service of tf.data api.
	
	
""""	# Load the filenames
	files = [f for f in os.listdir(data_dir) if f.startswith(file_match_str)]
	if do_shuffle:
	shuffle(files)
	    
	    mean = np.array([[vgg_mean]])
	
	 # For each file preprocess the image 
	for f in file: 
	img = Image.open(os.path.join(data_dir, f))
	
	width, height = img.size
	
	#Here, we crop the image to a square by cropping on the longer axis
	if width < height:
	left,right = 0, width
	top, bottom = (height-width)/2, ((height-width)/2) + width
	elif width > height:
	top, bottom = 0, height
	left, right = (width - height)/2, ((width-height)/2) + height
	else:
	arr = np.array(img.resize((image_size,image_size))).astype(np.float32)
	yield (arr, mean)
	
	arr = np.array(img.crop((left, top, right, bottom)).resize((image_size,image_size))).astype(np.float32)
	yield (arr, mean)
	
	
	def load_images_iterator(gen_func, zero_mean=False):
	
	"""This function returns a dataset iterator of tf.data API.
	    """
	    image_dataset = tf.data.Dataset.from_generator(
	        gen_func, 
	        output_types=(tf.float32, tf.float32), 
	        output_shapes=(tf.TensorShape(input_shape[1:]), tf.TensorShape([1, 1, 3]))
	    )
	    
	# If true, the mean will be subtracted

定义计算图

我们将表示完整的计算图。

定义提供输入的迭代器
定义输入和 CNN 变量
定义内容、风格和总损失
定义优化操作

config = tf.ConfigProto(allow_soft_placement=True)



# 1. Define the input pipeline in this step
part_style_gen_func = partial(image_gen_func, 'data', "style_")
part_content_gen_func = partial(image_gen_func, 'data', "content_")

style_iter = load_images_iterator(part_style_gen_func, zero_mean=True)
content_iter = load_images_iterator(part_content_gen_func, zero_mean=True)

# 2. Defining the inputs and weights
inputs = define_inputs(input_shape)
define_tf_weights()


layer_ids = list(vgg_layers.keys())

## gen_ph is used for initializing the generated image with the pixel value 
 
##trying initializing with white noise
.
## The init_generate gives the initialization operation. 
gen_ph = tf.placeholder(shape=input_shape, dtype=tf.float32)
init_generated = tf.assign(inputs["generated"], gen_ph)

# 3. Loss
# 3.1 Content loss in tf
c_loss = define_content_loss(
inputs=inputs, 
layer_ids=layer_ids, pool_inds=pool_inds, c_weight=alpha
)

# 3.2 Style loss
layer_weights_ph = tf.placeholder(shape=[len(layer_ids)], dtype=tf.float32, name='layer_weights')
s_loss = define_style_loss(
inputs=inputs, 
layer_ids=layer_ids, pool_inds=pool_inds, s_weight=beta, layer_weights=None
) 

下一主题风格迁移的过程

← 上一步下一步 →