C GAN

17 Mar 2025 | 6 分钟阅读

条件生成对抗网络（cGAN）是常规 GAN 的一种扩展，它在训练阶段加入了条件信息。传统 GAN 中的生成器生成数据样本时，无法控制生成输出的确切属性。而 cGAN 则能够根据额外信息（如类别标签或其他辅助数据）来生成样本。

为了更好地理解，可以把它们想象成不仅能创作出杰出艺术品，还能接受特定需求的艺术家。在传统的生成对抗网络（GAN）中，艺术家随机创作艺术，不知道要画哪种类型的图片。然而，cGAN 就像是聪明的画家，可以根据特定的需求或事实来生成图像。

C-GAN 的应用

它可以用于将卫星图像转换为地图、将黑白照片转换为彩色照片，或将草图转换为逼真图像等任务。通过根据期望的输出特征对生成器进行条件约束，cGAN 能够实现可控的图像转换。
生成不同年龄段人脸图像是一个常见的应用。通过根据与年龄相关的属性对生成器进行条件约束，cGAN 可以在不同生命阶段生成逼真的人脸图像。
cGAN 可以提高图像的分辨率。给定一个低分辨率输入，可以对生成器进行条件约束，以产生高分辨率输出。这在图像放大等任务中非常有用，可以避免显著的质量损失。
在创意应用中，cGAN 可以用来根据用户偏好生成定制内容。例如，生成个性化的时尚单品、室内设计或艺术作品。
在化学领域，cGAN 可以协助生成具有所需性质的分子结构。研究人员可以根据特定的化学属性对生成器进行条件约束，以获得符合特定标准的分子结构。

下面是一个简单 cGAN 架构的概述

如你所见，它相当直接。在创建图像时，我们使用噪声 z 与条件 c 的拼接。当我们使用判别器时，我们将生成的图像与我们用来生成它的 c 进行拼接。如果我们展示的是一个真实世界的例子，我们也会包含条件 c。

现在，我们将为 MNIST 创建一个简单的 cGAN，用于图像转换。

导入库

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
import matplotlib.pyplot as plt
from skimage.io import imshow
import time
import random

加载数据集

#Read CSV
csv = pd.read_csv('../input/train.csv')
#Separate into matricies
X_train = csv.iloc[:,1:786].as_matrix()
Y_train = csv.iloc[:,0].as_matrix()

X_train_imgs = np.zeros([42000,1,28,28])
for i in range(X_train.shape[0]):
    img = X_train[i,:].reshape([1,28,28])/255.
    X_train_imgs[i] = img

Y_train_oh = np.zeros([42000,10])
for i in range(Y_train.shape[0]):
    oh = np.zeros([10])
    oh[int(Y_train[i])] = 1.
    Y_train_oh[i] = oh

# showing one of the images from the dataset as an example.
ix = 599 #0-42000
imshow(np.squeeze(X_train_imgs[ix]))
plt.show()
print ('This is:',Y_train[ix])

输出

这是：2

C GAN

众所周知，GAN 很难训练。由于我们使用了两个神经网络，我们需要确保它们是平衡的。这是问题之一。另一个问题是模式坍塌（mode collapse），即生成器无法产生足够多样的图像。这可能会变得很麻烦，所以我们尝试采用一些技巧来保持 GAN 的平衡。

从正态分布而不是均匀分布中采样
将图像归一化到 -1 和 1 之间，而不是 1 和 0 之间
使用 max(logD) 而不是 min(log(1-D)) 作为训练生成器的损失函数
构建完整的真实图像和生成图像的小批量（mini-batch），而不是混合使用
使用 LeakyReLU 而不是 ReLU
使用 ConvTranspose2D 而不是上采样（Upsampling）
使用标签平滑（Label Smoothing）

此外，我们将添加一组更复杂的变量，在训练期间返回。我们不仅要返回 G 和 D 的损失，还要返回 D 的方差。我们希望判别器损失的波动尽可能小，所以我们会对此进行监控。

生成器网络

该生成器旨在通过生成随机噪声 (z) 和特定信息 (c) 的条件来产生逼真的视觉效果。最后一层的 Tanh 激活函数确保输出图像值在 [-1, 1] 范围内。

class _G(nn.Module):
    def __init__(self, z_size, c_size):
        super(_G, self).__init__()
        
        self.conv2dtranspose_z = nn.ConvTranspose2d(in_channels=z_size, out_channels=256, kernel_size=4, stride=1)
        self.bn2d_z = nn.BatchNorm2d(256, momentum=0.9)
        self.conv2dtranspose_c = nn.ConvTranspose2d(in_channels=c_size, out_channels=256, kernel_size=4, stride=1)
        self.bn2d_c = nn.BatchNorm2d(256, momentum=0.9)
        self.backbone = nn.Sequential(
            nn.ConvTranspose2d(in_channels=512, out_channels=256, kernel_size=4, stride=2, padding=1),
            nn.LeakyReLU(negative_slope=0.2),
            nn.BatchNorm2d(256, momentum=0.9),
            nn.ConvTranspose2d(in_channels=256, out_channels=128, kernel_size=4, stride=2, padding=1),
            nn.LeakyReLU(negative_slope=0.2),
            nn.BatchNorm2d(128, momentum=0.9),
            nn.ConvTranspose2d(in_channels=128, out_channels=1, kernel_size=2, stride=2, padding=2),
            nn.Tanh()
        )
    
    # weight_init
    def weight_init(self, mean, std):
        for m in self._modules:
            normal_init(self._modules[m], mean, std)
    
    def forward(self, z, c):
        z = F.leaky_relu(self.bn2d_z(self.conv2dtranspose_z(z.view(-1,100,1,1))))
        c = F.leaky_relu(self.bn2d_c(self.conv2dtranspose_c(c.view(-1,10,1,1))))
        zc = torch.cat([z,c],dim=1)
        output = self.backbone(zc)
        return output

判别器网络

该判别器旨在确定输入的图像及附带的条件信息是真实的还是由 cGAN 生成器创建的。最后的 Sigmoid 激活函数会生成一个关于输入真实性的概率分数。

class _D(nn.Module):
    def __init__(self,c_size):
        super(_D, self).__init__()
        
        self.conv2d_x = nn.Conv2d(in_channels=1, out_channels=64, kernel_size=4, stride=2, padding=1)
        self.conv2d_c = nn.Conv2d(in_channels=10, out_channels=64, kernel_size=4, stride=2, padding=1)
        self.backbone = nn.Sequential(
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=4, stride=2, padding=1),
            nn.LeakyReLU(negative_slope=0.2),
            nn.BatchNorm2d(256, momentum=0.9),
            nn.Conv2d(in_channels=256, out_channels=512, kernel_size=4, stride=2, padding=1),
            nn.LeakyReLU(negative_slope=0.2),
            nn.BatchNorm2d(512, momentum=0.9),
            nn.Conv2d(in_channels=512, out_channels=1, kernel_size=3, stride=2),
            nn.Sigmoid()
        )
    
    # weight_init
    def weight_init(self, mean, std):
        for m in self._modules:
            normal_init(self._modules[m], mean, std)
            
    
    def forward(self, x, c):
        x = self.conv2d_x(x)
        c = c.view(-1,10,1,1)
        c = c.expand(-1,10,28,28)
        c = self.conv2d_c(c)
        xc = torch.cat([x,c],dim=1)
        output = self.backbone(xc)
        output = output.view(-1,1)
        return output

网络初始化

这种初始化方案通常在 GAN 中使用，以帮助稳定训练。生成器和判别器都配备了适当的权重初始化，以增强学习过程并提高 cGAN 的整体性能。

G = _G(100, 10) #Noise vector will have size 100, and we will have a condition vector of 10(1 for each type of item)
D = _D(10) #The Discriminator will also use the condition, so we say it has size 10
def normal_init(m, mean, std):
    if isinstance(m, nn.ConvTranspose2d) or isinstance(m, nn.Conv2d):
        m.weight.data.normal_(mean, std)
        m.bias.data.zero_()
G.weight_init(mean=0, std=0.2) #GAN works better with these weight initializations
D.weight_init(mean=0, std=0.2)

与 Keras 不同，我们必须手动将网络移动到 GPU 上。这不是 Pytorch 自动执行的操作。这是因为它允许你构建一个复杂的多线程数据馈送器……等等。关键是 Pytorch 提供了比 Keras 更大的灵活性。它的速度也快了大约两倍，并且使用的内存要少得多。

G.cuda()
D.cuda()
#That's how you move it to the GPU

在这里，我们将创建一个损失函数和优化器。

criterion = nn.BCELoss()#Binary Cross-Entropy Loss
optim_G = optim.Adam(G.parameters(), lr=0.0002)
optim_D = optim.Adam(D.parameters(), lr=0.0002)

继续创建我们的优化器，我们将创建两个优化函数。

def optimize_G(G, D, z, c, optimizer, criterion):
    
    """
    When we train the generator we want it to trick the discriminator. This means that we want the output of D to be close to 1,
    meaning it thinks it is real. Keep that in mind. When we train G, we make the fake labels equal 1 so the optimizer tried to
    make the generator make an image that tricks D.
    """
    
    #Even though the images are fake, we want the discriminator to think they are real
    trick_labels = Variable(torch.ones([z.shape[0],1])-torch.rand([z.shape[0],1])/3).cuda()
    #Zero gradient buffers
    G.zero_grad()
    #Generate Images
    fake_x = G.forward(z, c)
    D_preds = D(fake_x, c)
    loss = criterion(D_preds, trick_labels)
    loss.backward()
    optimizer.step()
    
    return fake_x, loss


def optimize_D(net, fake_x, fake_c, real_x, real_c, optimizer, criterion):
    #We cannot feed a numpy variable. We have to use a torch.autograd.Variable
    fake_labels = Variable(torch.zeros([fake_x.shape[0],1])+torch.rand([z.shape[0],1])/3).cuda()
    real_labels = Variable(torch.ones([real_x.shape[0],1])-torch.rand([z.shape[0],1])/3).cuda()
    
    #We need to empty the gradient buffers
    net.zero_grad()
    
    #Let's get the discriminator predictions for the fake images
    fake_preds = net.forward(fake_x.detach(), fake_c)
    #Do the optimization
    fake_loss = criterion(fake_preds, fake_labels)
    #Let's get the discriminator predictions for the real images
    real_preds = net.forward(real_x, real_c)
    #Do the optimization
    real_loss = criterion(real_preds, real_labels)
    
    loss = fake_loss + real_loss
    loss.backward()
    optimizer.step()
    
    return fake_loss + real_loss

训练

现在，我们将看到 cGAN 模型在训练时的工作情况。

模型将在一个循环中进行训练并处理图像。

D_history = []
G_history = []
EPOCHS = 10
BATCH_SIZE = 128

for epoch in range(EPOCHS):
    train_loss = 0
    speed = 0
    for batch_number in range(int(Y_train.shape[0]/BATCH_SIZE)):
        G.train()
        time_start = time.time()
        real_x = Variable(torch.FloatTensor(X_train_imgs[batch_number*BATCH_SIZE:(1+batch_number)*BATCH_SIZE])).cuda()
        real_x = (real_x-real_x.mean())/real_x.std()
        real_c = Variable(torch.FloatTensor(Y_train_oh[batch_number*BATCH_SIZE:(1+batch_number)*BATCH_SIZE])).cuda()
        
        z = Variable(torch.FloatTensor(np.random.randn(BATCH_SIZE, 100))).cuda()
        fake_x, loss = optimize_G(G, D, z, real_c, optim_G, criterion)
        G_history.append(loss.data.cpu().numpy()[0])
        
        loss = optimize_D(D, fake_x, real_c, real_x, real_c, optim_D, criterion)
        D_history.append(loss.data.cpu().numpy()[0])
        
        
        if batch_number % 25 == 0:
            bigfig = []
            for i in range(0,10):
                z = np.random.randn(1,100)
                z = torch.FloatTensor(z)
                z = Variable(z).cuda()
                G.eval()
                fig = []
                for i in range(0, 10):
                    c = np.zeros([1,10])
                    c[0,i] = 1.
                    c = torch.FloatTensor(c)
                    c = Variable(c).cuda()
                    gens = G.forward(z, c)
                    gens = gens.data.cpu().numpy()
                    gens = gens.reshape([28,28])
                    fig.append(gens/2+0.5)
                fig = np.hstack(fig)
                bigfig.append(fig)
            bigfig = np.vstack(bigfig)
            print (bigfig.shape)
            imshow(bigfig)
            plt.show()
            
            print ('G loss: ',G_history[-1])
            print ('D loss: ',D_history[-1])
            print ('D loss variance: ',np.stack(D_history,axis=0).std())
    print ('Finished Epoch',epoch+1)

输出

下一个主题使用机器学习进行人类活动识别

C GAN

C-GAN 的应用

导入库

加载数据集

C GAN

生成器网络

判别器网络

网络初始化

训练

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

机器学习

监督式学习

分类

杂项

相关教程

面试题

C GAN

C-GAN 的应用

导入库

加载数据集

C GAN

生成器网络

判别器网络

网络初始化

训练

相关帖子

医学影像中的目标识别

BERT 应用

使用深度学习结合马尔可夫模型预测用户需求

数据分析 vs. 机器学习

YOLOV5 - 视频中的目标跟踪器

机器学习中的客户流失预测

GAN（生成对抗网络）十大书籍

使用机器学习在数据中查找模式

机器学习中的 NPS

机器学习中的图像处理

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器