使用 Pix2Pix 进行图像到图像转换

2025 年 7 月 3 日 | 10 分钟阅读

Pix2Pix 属于 GAN（生成对抗网络）类别，通常用于将一张图像转换为另一张图像。

在他们 2016 年的论文《Image-to-Image Translation with Conditional Adversarial Networks》（发表于 CVPR 2017）中，Phillip Isola 等人将 Pix2Pix 描述为使用 GAN 框架，其中生成器和判别器协同工作。

在模型训练过程中，两个网络相互竞争。生成器试图欺骗判别器，而判别器则试图识别生成器的伪造品。

在 Pix2Pix 中，图像生成仅在有输入图像时发生。判别器分析目标图像是否看起来像真实源图像的对应版本。模型通过对抗损失进行训练以实现逼真性，L1 损失则有助于保持其准确性。由于这种结构，Pix2Pix 可以将地图转换为卫星图像，为旧照片着色，并将草图转换为逼真的照片。

Pix2Pix GAN 用于图像到图像的翻译

Pix2Pix 被创建为一个 GAN，用于处理将图像转换为不同形式的各种任务。Phillip Isola 和他的同事们在 2016 年发表的一篇论文中将其命名为“Image-to-Image Translation with Conditional Adversarial Networks”（条件对抗网络图像到图像翻译），并于 2017 年在 CVPR 上进行了讨论。

在 GAN 中，生成器生成图像，判别器试图区分真实图像和伪造图像。随着时间的推移，生成器越来越能够模仿真实图像，因为判别器被其生成的图像所欺骗。

对于条件 GAN（cGAN），生成图像是通过提供输入来控制的。Pix2Pix 依赖于 cGAN，因此当生成器接收到输入图像时，它会生成翻译后的版本。判别器同时使用原始图像和配对图像来判断这对图像是真实的还是伪造的。

生成器被设计用来欺骗判别器，并将图像更接近目标图像。训练的进行需要输入图像及其对应的目标图像。由于这种安排，Pix2Pix 在不同的图像翻译场景中都能有效工作。

架构

Pix2Pix 依赖于条件 GAN 架构来构建其模型。通过选择这种设计，在图像到图像的翻译任务中有更大的机会表示细节和复杂特征。由于该模型能够同时学习主要模式和细微细节，因此能够创建更逼真的图像转换。

发电机

Pix2Pix 的生成器依赖于 U-Net 架构，这是通用编码器-解码器框架的改进版本。主要区别在于通过跳过中间的一些层来连接编码层和相应的解码层。由于这些连接，低级特征（如对象的放置）在编码器的下采样过程中不会丢失。因此，图像会变得更好、更精确。

编码器架构

编码器由七个卷积块组成，每个块包含一个卷积层和一个 LeakyReLU（斜率为 0.2），正如原始论文所述。只有第一个块没有使用批量归一化，以帮助提高训练稳定性。

解码器架构

解码器由七个转置卷积（上采样）块组成。每个块首先进行上采样，然后添加一个卷积层、批量归一化，最后以 ReLU 激活结束。

跳跃连接

编码器的层 i 与解码器的层 n – i 之间存在跳跃连接（其中 n 是总层数）。这些连接将编码器和解码器之间的特征图连接起来，因此模型不会丢失重要的空间信息。

判别器

Pix2Pix 判别器采用 PatchGAN 分类器。PatchGAN 不查看整个图像，而是选择 NxN 个块，并逐个检查它们，以判断它们看起来是真实的还是伪造的。分块评估有助于模型关注细微和详细的特征。

PatchGAN 在整个图像上运行，其结果被平均以决定判别器 D 的最终判断。因此，即使是图像的轮廓看起来也很逼真，并且没有重大的模糊问题。

组合对抗损失和 L1 损失

Pix2Pix 中的判别器使用识别真实和伪造图像的常规 GAN 方法。然而，生成器通过使用由对抗损失和 L1 损失组成的损失函数来获得更好的结果。对抗损失帮助生成器创建接近现实且属于目标域的图像，而 L1 损失确保生成器图像与预期结果保持相似，不会丢失源图像中的任何重要信息。

尽管科学家们在第一项研究中研究了 L2 损失，但 L1 损失效果更好，因为它能防止照片出现模糊。这种组合方法有助于生成器掌握生成视觉上逼真且能正确反映其来源图像的输出的能力。

如何训练和开发 Pix2Pix 模型？

在创建 Pix2Pix 模型以将卫星图像转换为 Google Maps 中看到的图像时，我们遵循原始论文中概述的设置。该模型基于条件 GAN 模型构建，包含生成器和判别器两个重要部分。生成器像地图一样生成图像，而判别器根据卫星照片判断生成的图像是否正确。

我们的模型使用 Keras 实现，并处理 256×256 像素的彩色图像。判别器被构建为 PatchGAN，因此每个输出代表输入图像的 70×70 部分。由于这种设计，模型可以关注细节并处理不同尺寸的图片。

判别器同时查看两张图像，并判断输入图像的每个部分是真实的还是伪造的可能性。可以将每张图像是伪造还是真实的概率相加并取平均值来检查图像的真实性，这有助于稳定系统并提高其输出准确性。

示例

from numpy import load, zeros, ones, randint
from keras.optimizers import Adam
from keras.initializers import RandomNormal
from keras.models import Model, Input
from keras.layers import Conv2D, Conv2DTranspose, LeakyReLU, Activation, Concatenate
from keras.layers import Dropout, BatchNormalization
from matplotlib import pyplot

# Define the discriminator model
def define_discriminator(image_shape):
	init = RandomNormal(stddev=0.02)
	src_image = Input(shape=image_shape)
	tgt_image = Input(shape=image_shape)
	merged = Concatenate()([src_image, tgt_image])
	
	d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(merged)
	d = LeakyReLU(alpha=0.2)(d)

	d = Conv2D(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d)
	d = BatchNormalization()(d)
	d = LeakyReLU(alpha=0.2)(d)

	d = Conv2D(256, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d)
	d = BatchNormalization()(d)
	d = LeakyReLU(alpha=0.2)(d)

	d = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d)
	d = BatchNormalization()(d)
	d = LeakyReLU(alpha=0.2)(d)

	d = Conv2D(512, (4,4), padding='same', kernel_initializer=init)(d)
	d = BatchNormalization()(d)
	d = LeakyReLU(alpha=0.2)(d)

	d = Conv2D(1, (4,4), padding='same', kernel_initializer=init)(d)
	output = Activation('sigmoid')(d)

	model = Model([src_image, tgt_image], output)
	opt = Adam(lr=0.0002, beta_1=0.5)
	model.compile(loss='binary_crossentropy', optimizer=opt, loss_weights=[0.5])
	return model

# Encoder block for the generator
def define_encoder_block(layer_in, n_filters, batchnorm=True):
	init = RandomNormal(stddev=0.02)
	g = Conv2D(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in)
	if batchnorm:
		g = BatchNormalization()(g, training=True)
	g = LeakyReLU(alpha=0.2)(g)
	return g

# Decoder block with optional dropout and skip connection
def decoder_block(layer_in, skip_in, n_filters, dropout=True):
	init = RandomNormal(stddev=0.02)
	g = Conv2DTranspose(n_filters, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(layer_in)
	g = BatchNormalization()(g, training=True)
	if dropout:
		g = Dropout(0.5)(g, training=True)
	g = Concatenate()([g, skip_in])
	g = Activation('relu')(g)
	return g

# Generator model with U-Net architecture
def define_generator(image_shape=(256,256,3)):
	init = RandomNormal(stddev=0.02)
	in_image = Input(shape=image_shape)

	e1 = define_encoder_block(in_image, 64, batchnorm=False)
	e2 = define_encoder_block(e1, 128)
	e3 = define_encoder_block(e2, 256)
	e4 = define_encoder_block(e3, 512)
	e5 = define_encoder_block(e4, 512)
	e6 = define_encoder_block(e5, 512)
	e7 = define_encoder_block(e6, 512)

	b = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(e7)
	b = Activation('relu')(b)

	d1 = decoder_block(b, e7, 512)
	d2 = decoder_block(d1, e6, 512)
	d3 = decoder_block(d2, e5, 512)
	d4 = decoder_block(d3, e4, 512, dropout=False)
	d5 = decoder_block(d4, e3, 256, dropout=False)
	d6 = decoder_block(d5, e2, 128, dropout=False)
	d7 = decoder_block(d6, e1, 64, dropout=False)

	g = Conv2DTranspose(3, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d7)
	out_image = Activation('tanh')(g)

	model = Model(in_image, out_image)
	return model

# Composite GAN model that combines generator and discriminator
def define_gan(g_model, d_model, image_shape):
	for layer in d_model.layers:
		if not isinstance(layer, BatchNormalization):
			layer.trainable = False

	src = Input(shape=image_shape)
	gen_out = g_model(src)
	dis_out = d_model([src, gen_out])
	model = Model(src, [dis_out, gen_out])

	opt = Adam(lr=0.0002, beta_1=0.5)
	model.compile(loss=['binary_crossentropy', 'mae'], optimizer=opt, loss_weights=[1,100])
	return model

# Load dataset from compressed npz file and normalize pixel values
def load_real_samples(filename):
	data = load(filename)
	X1, X2 = data['arr_0'], data['arr_1']
	X1 = (X1 - 127.5) / 127.5
	X2 = (X2 - 127.5) / 127.5
	return [X1, X2]

# Randomly sample real images and assign real labels
def generate_real_samples(dataset, n_samples, patch_shape):
	trainA, trainB = dataset
	ix = randint(0, trainA.shape[0], n_samples)
	X1, X2 = trainA[ix], trainB[ix]
	y = ones((n_samples, patch_shape, patch_shape, 1))
	return [X1, X2], y

# Generate fake images from the generator and assign fake labels
def generate_fake_samples(g_model, samples, patch_shape):
	X = g_model.predict(samples)
	y = zeros((len(X), patch_shape, patch_shape, 1))
	return X, y

# Visualize and save model output every few training steps
def summarize_performance(step, g_model, dataset, n_samples=3):
	[X_realA, X_realB], _ = generate_real_samples(dataset, n_samples, 1)
	X_fakeB, _ = generate_fake_samples(g_model, X_realA, 1)

	X_realA = (X_realA + 1) / 2.0
	X_realB = (X_realB + 1) / 2.0
	X_fakeB = (X_fakeB + 1) / 2.0

	for i in range(n_samples):
		pyplot.subplot(3, n_samples, 1 + i)
		pyplot.axis('off')
		pyplot.imshow(X_realA[i])
	for i in range(n_samples):
		pyplot.subplot(3, n_samples, 1 + n_samples + i)
		pyplot.axis('off')
		pyplot.imshow(X_fakeB[i])
	for i in range(n_samples):
		pyplot.subplot(3, n_samples, 1 + n_samples*2 + i)
		pyplot.axis('off')
		pyplot.imshow(X_realB[i])

	filename1 = 'plot_%06d.png' % (step+1)
	pyplot.savefig(filename1)
	pyplot.close()

	filename2 = 'model_%06d.h5' % (step+1)
	g_model.save(filename2)
	print('>Saved: %s and %s' % (filename1, filename2))

# Train the Pix2Pix model
def train(d_model, g_model, gan_model, dataset, n_epochs=100, n_batch=1):
	n_patch = d_model.output_shape[1]
	trainA, trainB = dataset
	bat_per_epo = int(len(trainA) / n_batch)
	n_steps = bat_per_epo * n_epochs

	for i in range(n_steps):
		[X_realA, X_realB], y_real = generate_real_samples(dataset, n_batch, n_patch)
		X_fakeB, y_fake = generate_fake_samples(g_model, X_realA, n_patch)

		d_loss1 = d_model.train_on_batch([X_realA, X_realB], y_real)
		d_loss2 = d_model.train_on_batch([X_realA, X_fakeB], y_fake)

		g_loss, _, _ = gan_model.train_on_batch(X_realA, [y_real, X_realB])

		print('>%d, d1[%.3f] d2[%.3f] g[%.3f]' % (i+1, d_loss1, d_loss2, g_loss))

		if (i+1) % (bat_per_epo * 10) == 0:
			summarize_performance(i, g_model, dataset)

# Load dataset and begin training
dataset = load_real_samples('maps_256.npz')
print('Loaded', dataset[0].shape, dataset[1].shape)
image_shape = dataset[0].shape[1:]

d_model = define_discriminator(image_shape)
g_model = define_generator(image_shape)
gan_model = define_gan(g_model, d_model, image_shape)

train(d_model, g_model, gan_model, dataset) 

输出

>1, d1[0.566] d2[0.520] g[82.266]
>2, d1[0.469] d2[0.484] g[66.813]
>3, d1[0.428] d2[0.477] g[79.520]
>4, d1[0.362] d2[0.405] g[78.143]
>5, d1[0.416] d2[0.406] g[72.452]

>109596, d1[0.303] d2[0.006] g[5.792]
>109597, d1[0.001] d2[1.127] g[14.343]
>109598, d1[0.000] d2[0.381] g[11.851]
>109599, d1[1.289] d2[0.547] g[6.901]
>109600, d1[0.437] d2[0.005] g[10.460]
>Saved: plot_109600.png and model_109600.h5

如何使用 Pix2Pix 模型翻译图像？

在训练 Pix2Pix 模型时，保存的模型和图像会随着过程的进行而保存。然而，长时间训练并不总能提供更好的图像结果。因此，人们应该选择生成图像质量最高的模型。在翻译图像时，可以使用此选定的模型随时翻译新图像。我们将使用训练 100 个 epoch 或 109,600 次迭代后保存的模型进行图像翻译任务。

示例

from keras.models import load_model
from keras.preprocessing.image import img_to_array, load_img
from numpy import expand_dims
from matplotlib import pyplot

# Load and preprocess the input image
def load_image(filename, size=(256, 256)):
	# Load the image and resize it to the target dimensions
	pixels = load_img(filename, target_size=size)
	# Convert the image to a numpy array
	pixels = img_to_array(pixels)
	# Normalize pixel values from [0, 255] to [-1, 1] for the model
	pixels = (pixels - 127.5) / 127.5
	# Add batch dimension so the shape becomes (1, 256, 256, 3)
	pixels = expand_dims(pixels, 0)
	return pixels

# Load the source input image (e.g., satellite photo)
src_image = load_image('satellite.jpg')
print('Loaded image shape:', src_image.shape)

# Load the trained Pix2Pix generator model
model = load_model('model_109600.h5')

# Generate the translated (target) image using the model
gen_image = model.predict(src_image)

# Rescale pixel values back from [-1, 1] to [0, 1] for visualization
gen_image = (gen_image + 1) / 2.0

# Display the generated image
pyplot.imshow(gen_image[0])
pyplot.axis('off')
pyplot.show() 

输出

 
Loaded (1, 256, 256, 3)

Pix2Pix GAN 的实际应用

Pix2Pix GAN 已支持将图像转换为其他类型图像的广泛应用。第一篇研究论文指出了神经网络的九个重要用途。

语义标签描述照片中的内容。

可以使用 Pix2Pix 将语义分割图转换为逼真的街景图像。自动驾驶汽车的发明和城市规划等常常依赖于来自数据集（例如 Cityscapes）的此类知识。

建筑标签描述照片的外观。

使用设计蓝图，它可以生成建筑物的正面图像。为此，研究人员使用了 Facades 数据集。

从上方看到的陆地区域以地图的形式呈现。

Pix2Pix 能够实现街景地图和卫星图像之间的转换，这有助于人们理解地理信息。训练数据来自 Google Maps。

为老黑白电影着色

它利用彩色照片的模式，为旧照片或黑白照片着色。

照片风格化方法

使用 Pix2Pix，艺术家和设计师可以将简单的轮廓图转换为逼真的图像。

多种转换器可以将您的草图变成照片

它会根据产品（甚至是包或鞋子）的草图创建详细的渲染图，以帮助产品设计和规划。

我的妆容随着一天的时间从自然妆变为晚妆

可以使用该模型调整照片，使其看起来像夜景，用于模拟或电影中的不同用途。

热成像变化，从热成像转换为彩色。

它执行将热像仪图像转换为彩色图像的任务，这在监控、救援人员以及检查机器过热等多个领域都有帮助。

图像修复（照片修复）

使用 Pix2Pix，可以替换图像中缺失的背景细节。该模型用于修复场景中缺失或隐藏的部分，例如，在巴黎街景中，它可以帮助恢复和修复照片。

下一主题BERT 应用

使用 Pix2Pix 进行图像到图像转换

Pix2Pix GAN 用于图像到图像的翻译

架构

发电机

组合对抗损失和 L1 损失

如何训练和开发 Pix2Pix 模型？

如何使用 Pix2Pix 模型翻译图像？

Pix2Pix GAN 的实际应用

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

机器学习

监督式学习

分类

杂项

相关教程

面试题

使用 Pix2Pix 进行图像到图像转换

Pix2Pix GAN 用于图像到图像的翻译

架构

发电机

组合对抗损失和 L1 损失

如何训练和开发 Pix2Pix 模型？

如何使用 Pix2Pix 模型翻译图像？

Pix2Pix GAN 的实际应用

相关帖子

如何保存机器学习模型

关联规则学习

贪婪层向预训练

贝叶斯回归

2025 年机器学习最新研究课题

ML 中用于聚类算法的不同方法类型

著名公司如何使用机器学习

医学影像中的目标识别

时间序列预测的挑战与方法

LSTM 自动编码器

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器