Big GAN

2025年3月17日 | 阅读 12 分钟

BigGAN 是一种生成对抗网络（GAN），能够生成超高分辨率、高保真度的图像。它是原始 GAN 框架的扩展，该框架包含相互竞争的神经网络。生成器创建人工图像，而判别器则评估其真实性。

它主要基于经验结果，并执行条件生成任务。在特定类别的情况下生成结果。它非常适合生成动物图像，但其他一些图像则有些随意。

这是一种结合了一系列新的优秀实践来教育类别条件图像的技术。它扩大了批次大小和模型参数数量。它正在考虑在光合作用方面达到新的艺术水平。

Big Gan 的应用

BigGAN 是一种强大的生成模型，已在各种领域找到应用。以下是 BigGAN 的一些常见用途和应用程序：

图像合成： BigGAN 被广泛用于生成高质量的合成图像。给定一组潜在向量作为输入，它可以跨多个类别生成多样化且逼真的图像。这通常用于创意应用程序、艺术创作或计算机视觉任务的数据增强。
数据增强： 在深度学习的背景下，BigGAN 可用于数据增强。生成更逼真的训练样本有助于提高模型的鲁棒性和泛化能力，尤其是在训练数据有限的情况下。
风格迁移： BigGAN 可应用于风格迁移。通过操作潜在向量输入，用户可以控制生成图像的视觉风格。这通常用于创意项目或创建视觉上吸引人的内容。
条件图像生成： BigGAN 的条件性质允许用户指定生成图像的特定属性或条件。例如，生成特定类别或具有特定视觉特征的图像。
域适应： BigGAN 可用于域适应任务。在与目标域非常相似的合成数据上进行训练，可以提高模型在应用于真实世界数据时的整体性能。
异常检测： 在异常检测场景中，BigGAN 可用于生成数据的正常或预期示例。与这些生成样本的偏差随后可被诊断为异常。
视觉概念探索： 研究人员和艺术家可以使用 BigGAN 来探索和可视化潜在空间。通过操作潜在向量，用户可以研究向量的各种变化如何影响生成的图像，从而能够探索视觉原理。

现在，为了更好地理解 Big GAN，我们将构建一个能够生成狗的图像的模型。

代码

导入库

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
import numpy as np
import torch
from torch import nn, optim
from torch import autograd
import torch.nn.functional as F
from torch.nn import Parameter
from torchvision import datasets, transforms
from torchvision.utils import save_image
from torch.utils.data import Dataset,DataLoader,Subset
from PIL import Image,ImageOps,ImageEnhance

import cv2
import albumentations as A
from albumentations.pytorch import ToTensor

import glob
import xml.etree.ElementTree as ET #for parsing XML
import shutil
from tqdm import tqdm
import time
import random

import os
print(os.listdir("../input"))

输出

TIME_LIMIT = 32400 - 60*10
start_time = time.time()
def elapsed_time(start_time):
    return time.time() - start_time

#random seeds
seed = 2019
random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)

BATCH_SIZE  = 32
NUM_WORKERS = 4
EMA = False
LABEL_NOISE = False
LABEL_NOISE_PROB = 0.1

读取数据集

现在，我们可以通过 PATH 精确地加载和处理目录中放置的一组狗的图片。变量 img_filenames 将包含这些图片的filenames列表，并通过计算此列表的长度来确定特定目录中有多少张图片。

PATH = '../input/all-dogs/all-dogs/'
img_filenames = os.listdir(PATH)
len(img_filenames)

输出

PATH_ANNOTATION = '../input/annotation/Annotation/'
breeds = glob.glob(PATH_ANNOTATION+'*')
annotations = []
for breed in breeds:
    annotations += glob.glob(breed+'/*')
len(annotations)

输出

breed_map = {}
for annotation in annotations:
    breed = annotation.split('/')[-2]
    index = breed.split('-')[0]
    breed_map.setdefault(index,breed)
n_classes = len(breed_map)
n_classes

输出

现在，我们将定义 bounding_box 和 bounding_box_ratio，它们用于从与图像相关的 XML 文档中提取边界框信息。这些函数以图片filenames为输入，并返回边界框坐标列表或边界框尺寸和比例的列表。

def bounding_box(img):
    bpath = PATH_ANNOTATION + str(breed_map[img.split('_')[0]])+'/'+str(img.split('.')[0])
    tree  = ET.parse(bpath)
    root  = tree.getroot()
    objects = root.findall('object')
    bbxs = []
    for o in objects:
        bndbox = o.find('bndbox') #reading bound box
        xmin = int(bndbox.find('xmin').text)
        ymin = int(bndbox.find('ymin').text)
        xmax = int(bndbox.find('xmax').text)
        ymax = int(bndbox.find('ymax').text)
        bbxs.append((xmin,ymin,xmax,ymax))
    return bbxs

def bounding_box_ratio(img):
    bpath = PATH_ANNOTATION + str(breed_map[img.split('_')[0]])+'/'+str(img.split('.')[0])
    tree  = ET.parse(bpath)
    root  = tree.getroot()
    objects = root.findall('object')
    bbx_ratios = []
    for o in objects:
        bndbox = o.find('bndbox') #reading bound box
        xmin = int(bndbox.find('xmin').text)
        ymin = int(bndbox.find('ymin').text)
        xmax = int(bndbox.find('xmax').text)
        ymax = int(bndbox.find('ymax').text)
        xlen = xmax - xmin
        ylen = ymax - ymin
        ratio = ylen / xlen
        bbx_ratios.append((xlen,ylen,ratio))
    return bbx_ratios

我们将根据图片的长宽比过滤图片，并提供有关原始图片数量和过滤后图片数量的信息。

%%time
#threshold for aspect ratio, at the same time idx for each bbx
img_filenames_th = []
ratios_th = []
for img in tqdm(img_filenames):
    bbx_ratios = bounding_box_ratio(img)
    for i,(xlen,ylen,ratio) in enumerate(bbx_ratios):
        if ((ratio>0.2)&(ratio<4.0)):
            img_filenames_th.append(img[:-4]+'_'+str(i)+'.jpg')
            ratios_th.append(ratio)
ratios_th = np.array(ratios_th)

print('original : ', len(img_filenames))
print('after th : ', len(img_filenames_th))

输出

intruders = [
    #n02088238-basset
    'n02088238_10870_0.jpg',
    
    #n02088466-bloodhound
    'n02088466_6901_1.jpg',
    'n02088466_6963_0.jpg',
    'n02088466_9167_0.jpg',
    'n02088466_9167_1.jpg',
    'n02088466_9167_2.jpg',
    
    #n02089867-Walker_hound
    'n02089867_2221_0.jpg',
    'n02089867_2227_1.jpg',
    
    #n02089973-English_foxhound # No details
    'n02089973_1132_3.jpg',
    'n02089973_1352_3.jpg',
    'n02089973_1458_1.jpg',
    'n02089973_1799_2.jpg',
    'n02089973_2791_3.jpg',
    'n02089973_4055_0.jpg',
    'n02089973_4185_1.jpg',
    'n02089973_4185_2.jpg',
    
    #n02090379-redbone
    'n02090379_4673_1.jpg',
    'n02090379_4875_1.jpg',
    
    #n02090622-borzoi # Confusing
    'n02090622_7705_1.jpg',
    'n02090622_9358_1.jpg',
    'n02090622_9883_1.jpg',
    
    #n02090721-Irish_wolfhound # very small
    'n02090721_209_1.jpg',
    'n02090721_1222_1.jpg',
    'n02090721_1534_1.jpg',
    'n02090721_1835_1.jpg',
    'n02090721_3999_1.jpg',
    'n02090721_4089_1.jpg',
    'n02090721_4276_2.jpg',
    
    #n02091032-Italian_greyhound
    'n02091032_722_1.jpg',
    'n02091032_745_1.jpg',
    'n02091032_1773_0.jpg',
    'n02091032_9592_0.jpg',
    
    #n02091134-whippet
    'n02091134_2349_1.jpg',
    'n02091134_14246_2.jpg',
    
    #n02091244-Ibizan_hound
    'n02091244_583_1.jpg',
    'n02091244_2407_0.jpg',
    'n02091244_3438_1.jpg',
    'n02091244_5639_1.jpg',
    'n02091244_5639_2.jpg',
    
    #n02091467-Norwegian_elkhound
    'n02091467_473_0.jpg',
    'n02091467_4386_1.jpg',
    'n02091467_4427_1.jpg',
    'n02091467_4558_1.jpg',
    'n02091467_4560_1.jpg',
    
    #n02091635-otterhound
    'n02091635_1192_1.jpg',
    'n02091635_4422_0.jpg',
    
    #n02091831-Saluki
    'n02091831_1594_1.jpg',
    'n02091831_2880_0.jpg',
    'n02091831_7237_1.jpg',
    
    #n02092002-Scottish_deerhound
    'n02092002_1551_1.jpg',
    'n02092002_1937_1.jpg',
    'n02092002_4218_0.jpg',
    'n02092002_4596_0.jpg',
    'n02092002_5246_1.jpg',
    'n02092002_6518_0.jpg',
    
    #02093256-Staffordshire_bullterrier
    'n02093256_1826_1.jpg',
    'n02093256_4997_0.jpg',
    'n02093256_14914_0.jpg',
    
    #n02093428-American_Staffordshire_terrier
    'n02093428_5662_0.jpg',
    'n02093428_6949_1.jpg'
            ]

len(intruders)

输出

def data_preprocessing(img_path,bbx_idx):
    bbx = bounding_box(img_path)[bbx_idx]
    img  = Image.open(os.path.join(PATH,img_path))#PILImage format
    img_cropped  = img.crop(bbx)
    return img_cropped

data_preprocessing 函数接受图片路径和边界框索引，使用 PIL (Python Imaging Library) 读取图片，将其裁剪到提供的边界框，然后返回裁剪后的图片。

%%time
breed_map_2 = {}
for i,b in enumerate(breed_map.keys()):
    breed_map_2[b] = i

输出

我们将创建 DogDataset 类，这是一个处理狗图片的自定义数据集类。它接受图片路径列表，转换图片，并根据品种映射为它们打标签。

class DogDataset(Dataset):
    def __init__(self, path, img_list, transform1=None, transform2=None):
        self.path      = path
        self.img_list  = img_list
        self.transform1 = transform1
        self.transform2 = transform2
        
        self.imgs   = []
        self.labels = []
        for i,full_img_path in enumerate(self.img_list):
            if full_img_path in intruders:
                continue
            #img
            img_path = full_img_path[:-6]+'.jpg'
            bbx_idx  = int(full_img_path[-5])
            img = data_preprocessing(img_path,bbx_idx)
            if self.transform1:
                img = self.transform1(img) #output shape=(ch,h,w)
            self.imgs.append(img)
            #label
            label = breed_map_2[img_path.split('_')[0]]
            self.labels.append(label)
            
    def __len__(self):
        return len(self.imgs)
    
    def __getitem__(self,idx):
        img = self.imgs[idx]
        if self.transform2:
            img = self.transform2(img)
        label = self.labels[idx]
        return {'img':img, 'label':label}

使用 PyTorch 的 transforms 包，我们将定义两组用于图像处理的转换：

transform1： 将图片调整到提供的 img_size (64x64 像素)。
transform2： 将图片随机裁剪到 img_size (64x64 像素)。
使用 0.5 的概率进行随机水平翻转。将图片转换为 PyTorch 张量。根据提供的均值和标准差对像素值进行归一化。

%%time
# generate 64x64 images!
#resize_size = 68
img_size    = 64
batch_size  = BATCH_SIZE
MEAN1,MEAN2,MEAN3 = 0.5, 0.5, 0.5
STD1,STD2,STD3    = 0.5, 0.5, 0.5

transform1 = transforms.Compose([transforms.Resize(img_size)])

transform2 = transforms.Compose([transforms.RandomCrop(img_size),
                                 #transforms.RandomAffine(degrees=5),
                                 transforms.RandomHorizontalFlip(p=0.5),
                                 #transforms.RandomApply(random_transforms, p=0.3),
                                 transforms.ToTensor(),
                                 transforms.Normalize(mean=[MEAN1, MEAN2, MEAN3],
                                                      std=[STD1, STD2, STD3]),
                                ])

train_set = DogDataset(path=PATH,
                       img_list=img_filenames_th,
                       transform1=transform1,
                       transform2=transform2,
                      )

train_loader = DataLoader(train_set,
                          shuffle=True, batch_size=batch_size,
                          num_workers=NUM_WORKERS, pin_memory=True)

输出

img = data_preprocessing(img_filenames_th[1500][:-6]+'.jpg',0)
img = transform1(img)
img

输出

检查 GPU 是否可用。

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

输出

# Let's calculate the total number of trainable parameters in a model
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

模型

现在，我们将构建我们的模型，包括其用于条件生成对抗网络（cGAN）的实用函数和模块。实用函数提供卷积层和权重初始化。Attention 模块提供自注意力机制，提高了模型有效捕获空间关系的能力。ConditionalNorm，一个条件归一化模块，将批归一化与嵌入层结合起来，使用条件信息（如类别标签）执行归一化。

def conv3x3(in_channel, out_channel): #not change resolusion
    return nn.Conv2d(in_channel,out_channel,
                      kernel_size=3,stride=1,padding=1,dilation=1,bias=False)

def conv1x1(in_channel, out_channel): #not change resolution
    return nn.Conv2d(in_channel,out_channel,
                      kernel_size=1,stride=1,padding=0,dilation=1,bias=False)

def init_weight(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.orthogonal_(m.weight, gain=1)
        if m.bias is not None:
            m.bias.data.zero_()
            
    elif classname.find('Batch') != -1:
        m.weight.data.normal_(1,0.02)
        m.bias.data.zero_()
    
    elif classname.find('Linear') != -1:
        nn.init.orthogonal_(m.weight, gain=1)
        if m.bias is not None:
            m.bias.data.zero_()
    
    elif classname.find('Embedding') != -1:
        nn.init.orthogonal_(m.weight, gain=1)
        

# class Attention(nn.Module):
#     def __init__(self, c, h, w):
#         super().__init__()
#         self.attention_fc = nn.Linear(c,1, bias=False).apply(init_weight)
#         self.bias         = nn.Parameter(torch.zeros((1,h,w,1), requires_grad=True))
#         self.sigmoid      = nn.Sigmoid()
        
#     def forward(self,inputs):
#         batch,c,h,w = inputs.size()
#         x = torch.transpose(inputs, 1,2) #(*,c,h,w)->(*,h,c,w)
#         x = torch.transpose(x, 2,3) #(*,h,c,w)->(*,h,w,c)
#         x = self.attention_fc(x) + self.bias
#         x = torch.transpose(x, 2,3) #(*,h,w,1)->(*,h,1,w)
#         x = torch.transpose(x, 1,2) #(*,h,1,w)->(*,1,h,w)
#         x = self.sigmoid(x)
#         return inputs * x

class Attention(nn.Module):
    def __init__(self, channels):
        super().__init__()
        self.channels = channels
        self.theta    = nn.utils.spectral_norm(conv1x1(channels, channels//8)).apply(init_weight)
        self.phi      = nn.utils.spectral_norm(conv1x1(channels, channels//8)).apply(init_weight)
        self.g        = nn.utils.spectral_norm(conv1x1(channels, channels//2)).apply(init_weight)
        self.o        = nn.utils.spectral_norm(conv1x1(channels//2, channels)).apply(init_weight)
        self.gamma    = nn.Parameter(torch.tensor(0.), requires_grad=True)
        
    def forward(self, inputs):
        batch,c,h,w = inputs.size()
        theta = self.theta(inputs) #->(*,c/8,h,w)
        phi   = F.max_pool2d(self.phi(inputs), [2,2]) #->(*,c/8,h/2,w/2)
        g     = F.max_pool2d(self.g(inputs), [2,2]) #->(*,c/2,h/2,w/2)
        
        theta = theta.view(batch, self.channels//8, -1) #->(*,c/8,h*w)
        phi   = phi.view(batch, self.channels//8, -1) #->(*,c/8,h*w/4)
        g     = g.view(batch, self.channels//2, -1) #->(*,c/2,h*w/4)
        
        beta = F.softmax(torch.bmm(theta.transpose(1,2), phi), -1) #->(*,h*w,h*w/4)
        o    = self.o(torch.bmm(g, beta.transpose(1,2)).view(batch,self.channels//2,h,w)) #->(*,c,h,w)
        return self.gamma*o + inputs
        
    
class ConditionalNorm(nn.Module):
    def __init__(self, in_channel, n_condition):
        super().__init__()
        self.bn = nn.BatchNorm2d(in_channel, affine=False) #no learning parameters
        self.embed = nn.Linear(n_condition, in_channel* 2)
        
        nn.init.orthogonal_(self.embed.weight.data[:, :in_channel], gain=1)
        self.embed.weight.data[:, in_channel:].zero_()

    def forward(self, inputs, label):
        out = self.bn(inputs)
        embed = self.embed(label.float())
        gamma, beta = embed.chunk(2, dim=1)
        gamma = gamma.unsqueeze(2).unsqueeze(3)
        beta = beta.unsqueeze(2).unsqueeze(3)
        out = gamma * out + beta
        return out

现在，我们将深入研究条件生成对抗网络（cGAN），该网络具有结合了 BigGAN 和 Leaky ReLU 激活函数的修改后的架构。生成器由残差块（ResBlock_G）组成，这些残差块包含条件归一化和注意力模块。其目标是从随机噪声和类别标签生成高质量的图像。判别器 Discriminator 使用修改后的残差块（ResBlock_D），该残差块具有谱归一化和 Leaky ReLU 激活，同时考虑类别信息来区分真实和生成的图像。Attention 模块（Attention）提高了生成器和判别器的特征捕获能力。该代码强调了谱归一化、Leaky ReLU 和注意力方法在有效 cGAN 训练和生成中的重要性。

#BigGAN + leaky_relu           
class ResBlock_G(nn.Module):
    def __init__(self, in_channel, out_channel, condition_dim, upsample=True):
        super().__init__()
        self.cbn1 = ConditionalNorm(in_channel, condition_dim)
        self.upsample = nn.Sequential()
        if upsample:
            self.upsample.add_module('upsample',nn.Upsample(scale_factor=2, mode='nearest'))
        self.conv3x3_1 = nn.utils.spectral_norm(conv3x3(in_channel, out_channel)).apply(init_weight)
        self.cbn2 = ConditionalNorm(out_channel, condition_dim)
        self.conv3x3_2 = nn.utils.spectral_norm(conv3x3(out_channel, out_channel)).apply(init_weight) 
        self.conv1x1   = nn.utils.spectral_norm(conv1x1(in_channel, out_channel)).apply(init_weight)
        
    def forward(self, inputs, condition):
        x  = F.leaky_relu(self.cbn1(inputs, condition))
        x  = self.upsample(x)
        x  = self.conv3x3_1(x)
        x  = self.conv3x3_2(F.leaky_relu(self.cbn2(x, condition)))
        x += self.conv1x1(self.upsample(inputs)) #shortcut
        return x

class Generator(nn.Module):
    def __init__(self, n_feat, codes_dim=24, n_classes=n_classes):
        super().__init__()
        self.fc   = nn.Sequential(
            nn.utils.spectral_norm(nn.Linear(codes_dim, 16*n_feat*4*4)).apply(init_weight)
        )
        self.res1 = ResBlock_G(16*n_feat, 16*n_feat, codes_dim+n_classes, upsample=True)
        self.res2 = ResBlock_G(16*n_feat,  8*n_feat, codes_dim+n_classes, upsample=True)
        #self.attn2 = Attention(8*n_feat)
        self.res3 = ResBlock_G( 8*n_feat,  4*n_feat, codes_dim+n_classes, upsample=True)
        self.attn = Attention(4*n_feat)
        self.res4 = ResBlock_G( 4*n_feat,  2*n_feat, codes_dim+n_classes, upsample=True)
        self.conv = nn.Sequential(
            #nn.BatchNorm2d(2*n_feat).apply(init_weight),
            nn.LeakyReLU(),
            nn.utils.spectral_norm(conv3x3(2*n_feat, 3)).apply(init_weight),
        )
        
    def forward(self, z, label_ohe, codes_dim=24):
        '''
        z.shape = (*,120)
        label_ohe.shape = (*,n_classes)
        '''
        batch = z.size(0)
        z = z.squeeze()
        label_ohe = label_ohe.squeeze()
        codes = torch.split(z, codes_dim, dim=1)
        
        x = self.fc(codes[0]) #->(*,16ch*4*4)
        x = x.view(batch,-1,4,4) #->(*,16ch,4,4)
        
        condition = torch.cat([codes[1], label_ohe], dim=1) #(*,codes_dim+n_classes)
        x = self.res1(x, condition)#->(*,16ch,8,8)
        
        condition = torch.cat([codes[2], label_ohe], dim=1)
        x = self.res2(x, condition) #->(*,8ch,16,16)
        #x = self.attn2(x) #not change shape
        
        condition = torch.cat([codes[3], label_ohe], dim=1)
        x = self.res3(x, condition) #->(*,4ch,32,32)
        x = self.attn(x) #not change shape
        
        condition = torch.cat([codes[4], label_ohe], dim=1)
        x = self.res4(x, condition) #->(*,2ch,64,64)
        
        x = self.conv(x) #->(*,3,64,64)
        x = torch.tanh(x)
        return x
    

class ResBlock_D(nn.Module):
    def __init__(self, in_channel, out_channel, downsample=True):
        super().__init__()
        self.layer = nn.Sequential(
            nn.LeakyReLU(0.2),
            nn.utils.spectral_norm(conv3x3(in_channel, out_channel)).apply(init_weight),
            nn.LeakyReLU(0.2),
            nn.utils.spectral_norm(conv3x3(out_channel, out_channel)).apply(init_weight),
        )
        self.shortcut = nn.Sequential(
            nn.utils.spectral_norm(conv1x1(in_channel,out_channel)).apply(init_weight),
        )
        if downsample:
            self.layer.add_module('avgpool',nn.AvgPool2d(kernel_size=2,stride=2))
            self.shortcut.add_module('avgpool',nn.AvgPool2d(kernel_size=2,stride=2))
        
    def forward(self, inputs):
        x  = self.layer(inputs)
        x += self.shortcut(inputs)
        return x
    

class Discriminator(nn.Module):
    def __init__(self, n_feat, n_classes=n_classes):
        super().__init__()
        self.res1 = ResBlock_D(3, n_feat, downsample=True)
        self.attn = Attention(n_feat)
        self.res2 = ResBlock_D(  n_feat, 2*n_feat, downsample=True)
        #self.attn2 = Attention(2*n_feat)
        self.res3 = ResBlock_D(2*n_feat, 4*n_feat, downsample=True)
        self.res4 = ResBlock_D(4*n_feat, 8*n_feat, downsample=True)
        self.res5 = ResBlock_D(8*n_feat,16*n_feat, downsample=False)
        self.fc   = nn.utils.spectral_norm(nn.Linear(16*n_feat,1)).apply(init_weight)
        self.embedding = nn.Embedding(num_embeddings=n_classes, embedding_dim=16*n_feat).apply(init_weight)
        
    def forward(self, inputs, label):
        batch = inputs.size(0) #(*,3,64,64)
        h = self.res1(inputs) #->(*,ch,32,32)
        h = self.attn(h) #not change shape
        h = self.res2(h) #->(*,2ch,16,16)
        #h = self.attn2(h) #not change shape
        h = self.res3(h) #->(*,4ch,8,8)
        h = self.res4(h) #->(*,8ch,4,4)
        h = self.res5(h) #->(*,16ch,4,4)
        h = torch.sum((F.leaky_relu(h,0.2)).view(batch,-1,4*4), dim=2) #GlobalSumPool ->(*,16ch)
        outputs = self.fc(h) #->(*,1)
        
        if label is not None:
            embed = self.embedding(label) #->(*,16ch)
            outputs += torch.sum(embed*h,dim=1,keepdim=True) #->(*,1)
        
        outputs = torch.sigmoid(outputs)
        return outputs

训练

在本节中，我们将使用 Leaky ReLU 激活函数训练具有 BigGAN 架构的条件生成对抗网络（cGAN）。为了实现有效的训练和生成，生成器（Generator）和判别器（Discriminator）使用条件归一化、注意力模块和谱归一化。训练循环包括对判别器和生成器的更新，以及可选的生成器参数的指数移动平均（EMA）。

def generate_img(netG,fixed_noise,fixed_aux_labels=None):
    if fixed_aux_labels is not None:
        gen_image = netG(fixed_noise,fixed_aux_labels).to('cpu').clone().detach().squeeze(0)
    else:
        gen_image = netG(fixed_noise).to('cpu').clone().detach().squeeze(0)
    #denormalize
    gen_image = gen_image*0.5 + 0.5
    gen_image_numpy = gen_image.numpy().transpose(0,2,3,1)
    return gen_image_numpy

def show_generate_imgs(netG,fixed_noise,fixed_aux_labels=None):
    gen_images_numpy = generate_img(netG,fixed_noise,fixed_aux_labels)

    fig = plt.figure(figsize=(25, 16))
    # Display 10 images from each class
    for i, img in enumerate(gen_images_numpy):
        ax = fig.add_subplot(4, 8, i + 1, xticks=[], yticks=[])
        plt.imshow(img)
    plt.show()
    plt.close()

def cycle(iterable):
    """
    dataloader?iterator???
    :param iterable:
    :return:
    """
    while True:
        for x in iterable:
            yield x

#BigGAN
def run(lr_G=3e-4,lr_D=3e-4, beta1=0.0, beta2=0.999, nz=120, epochs=2, 
        n_ite_D=1, ema_decay_rate=0.999, show_epoch_list=None, output_freq=10):

    netG = Generator(n_feat=36, codes_dim=24, n_classes=n_classes).to(device) #z.shape=(*,120)
    netD = Discriminator(n_feat=42, n_classes=n_classes).to(device)


    if EMA:
        #EMA of G for sampling
        netG_EMA = Generator(n_feat=42, codes_dim=24, n_classes=n_classes).to(device)
        netG_EMA.load_state_dict(netG.state_dict())
        for p in netG_EMA.parameters():
            p.requires_grad = False

        
    print(count_parameters(netG))
    print(count_parameters(netD))
    
    real_label = 0.9
    fake_label = 0
    
    D_loss_list = []
    G_loss_list = []
    
    dis_criterion = nn.BCELoss().to(device)

    optimizerD = optim.Adam(netD.parameters(), lr=lr_D, betas=(beta1, beta2))
    optimizerG = optim.Adam(netG.parameters(), lr=lr_G, betas=(beta1, beta2))
    
    fixed_noise = torch.randn(32, nz, 1, 1, device=device)
    #fixed_noise = fixed_noise / fixed_noise.norm(dim=1, keepdim=True)
    fixed_aux_labels     = np.random.randint(0,n_classes, 32)
    fixed_aux_labels_ohe = np.eye(n_classes)[fixed_aux_labels]
    fixed_aux_labels_ohe = torch.from_numpy(fixed_aux_labels_ohe[:,:,np.newaxis,np.newaxis])
    fixed_aux_labels_ohe = fixed_aux_labels_ohe.float().to(device, non_blocking=True)

    netG.train()
    netD.train()

    ### training here
    for epoch in range(1,epochs+1):
        if elapsed_time(start_time) > TIME_LIMIT:
            print(f'elapsed_time go beyond {TIME_LIMIT} sec')
            break
        D_running_loss = 0
        G_running_loss = 0
        for ii, data in enumerate(train_loader):
            ############################
            # (1) Update D network
            ###########################
            for _ in range(n_ite_D):
                
                if LABEL_NOISE:
                    real_label = 0.9
                    fake_label = 0
                    if np.random.random() < LABEL_NOISE_PROB:
                        real_label = 0
                        fake_label = 0.9
                    
                # train with real
                netD.zero_grad()
                real_images = data['img'].to(device, non_blocking=True) 
                batch_size  = real_images.size(0)
                dis_labels  = torch.full((batch_size, 1), real_label, device=device) #shape=(*,)
                aux_labels  = data['label'].long().to(device, non_blocking=True) #shape=(*,)
                dis_output = netD(real_images, aux_labels) #dis shape=(*,1)
                errD_real  = dis_criterion(dis_output, dis_labels)
                errD_real.backward(retain_graph=True)

                # train with fake
                noise  = torch.randn(batch_size, nz, 1, 1, device=device)
                #noise = noise / noise.norm(dim=1, keepdim=True)
                aux_labels     = np.random.randint(0,n_classes, batch_size)
                aux_labels_ohe = np.eye(n_classes)[aux_labels]
                aux_labels_ohe = torch.from_numpy(aux_labels_ohe[:,:,np.newaxis,np.newaxis])
                aux_labels_ohe = aux_labels_ohe.float().to(device, non_blocking=True)
                aux_labels = torch.from_numpy(aux_labels).long().to(device, non_blocking=True)
                
                fake = netG(noise, aux_labels_ohe) #output shape=(*,3,64,64)
                dis_labels.fill_(fake_label)
                dis_output = netD(fake.detach(),aux_labels)
                errD_fake  = dis_criterion(dis_output, dis_labels)
                errD_fake.backward(retain_graph=True)
                D_running_loss += (errD_real.item() + errD_fake.item())/len(train_loader)
                optimizerD.step()

            ############################
            # (2) Update G network
            ###########################
            netG.zero_grad()
            dis_labels.fill_(real_label)  # fake labels are real for generator cost
            noise = torch.randn(batch_size, nz, 1, 1, device=device)
            aux_labels     = np.random.randint(0,n_classes, batch_size)
            aux_labels_ohe = np.eye(n_classes)[aux_labels]
            aux_labels_ohe = torch.from_numpy(aux_labels_ohe[:,:,np.newaxis,np.newaxis])
            aux_labels_ohe = aux_labels_ohe.float().to(device, non_blocking=True)
            aux_labels = torch.from_numpy(aux_labels).long().to(device, non_blocking=True)
            fake  = netG(noise, aux_labels_ohe)
            
            dis_output = netD(fake, aux_labels)
            errG   = dis_criterion(dis_output, dis_labels)
            errG.backward(retain_graph=True)
            G_running_loss += errG.item()/len(train_loader)
            optimizerG.step()
        
        if EMA:
            #update netG_EMA
            param_itr = cycle(netG.parameters())
            for i,p_EMA in enumerate(netG_EMA.parameters()):
                p = next(param_itr)
                p_EMA.data = (1-ema_decay_rate)*p_EMA.data + ema_decay_rate*p.data
                p_EMA.requires_grad = False
        
        #log
        D_loss_list.append(D_running_loss)
        G_loss_list.append(G_running_loss)
        
        #output
        if epoch % output_freq == 0:
            print('[{:d}/{:d}] D_loss = {:.3f}, G_loss = {:.3f}, elapsed_time = {:.1f} min'.format(epoch,epochs,D_running_loss,G_running_loss,elapsed_time(start_time)/60))
            
        if epoch in show_epoch_list:
            print('epoch = {}'.format(epoch))
            if not EMA:
                show_generate_imgs(netG,fixed_noise,fixed_aux_labels_ohe)
            elif EMA:
                show_generate_imgs(netG_EMA,fixed_noise,fixed_aux_labels_ohe)
            
        if epoch % 100 == 0:
            if not EMA:
                torch.save(netG.state_dict(), f'generator_epoch{epoch}.pth')
            elif EMA:
                torch.save(netG_EMA.state_dict(), f'generator_epoch{epoch}.pth')
    
    if not EMA:
        torch.save(netG.state_dict(), 'generator.pth')
    elif EMA:
        torch.save(netG_EMA.state_dict(), 'generator.pth')
    torch.save(netD.state_dict(), 'discriminator.pth')
    
    res = {'netG ':netG,
           'netD':netD,
           'nz':nz,
           'fixed_noise':fixed_noise,
           'fixed_aux_labels_ohe':fixed_aux_labels_ohe,
           'D_loss_list':D_loss_list,
           'G_loss_list':G_loss_list,
          }
    if EMA:
        res['netG_EMA'] = netG_EMA
        
    return res

%%time
#show_epoch_list = np.arange(0,100,1)
show_epoch_list = np.arange(0,500+10,10)

res = run(lr_G=3e-4,lr_D=3e-4, beta1=0.0, beta2=0.999, nz=120, epochs=500, 
          n_ite_D=1, ema_decay_rate=None, show_epoch_list=show_epoch_list, output_freq=10)
# res = run(lr_G=3e-4,lr_D=3e-4, beta1=0.5, beta2=0.999, nz=120, epochs=500, 
#           n_ite_D=1, ema_decay_rate=None, show_epoch_list=show_epoch_list, output_freq=10)

输出

判别器损失（DLoss）和生成器损失（GLoss）是评估生成对抗网络（GAN）训练期间性能的重要指标。DLoss 反映了判别器区分真实样本和生成样本的能力，而 GLoss 则显示了生成器创建能够欺骗判别器的样本的能力。

plt.plot(res['D_loss_list'], label='D_loss')
plt.plot(res['G_loss_list'], label='G_loss')
plt.grid()
plt.legend()
plt.title('loss history');

输出

我们的判别器损失（DLoss）正在下降，而生成器损失（GLoss）正在增长，这表明您的生成器在欺骗判别器方面变得更加擅长。这是在 GAN 训练过程正常运行时常见的现象。

现在，我们将循环遍历生成的图像，并使用 matplotlib 显示它们。我们使用的是由原始生成器（netG）或带有指数移动平均（EMA）的生成器生成的图像。

if not EMA:
    gen_image_numpy = generate_img(res['netG'],res['fixed_noise'],res['fixed_aux_labels_ohe'])
elif EMA:
    gen_image_numpy = generate_img(res['netG_EMA'],res['fixed_noise'],res['fixed_aux_labels_ohe'])
for img in gen_image_numpy:
    plt.imshow(img)
    plt.show()

输出

这些是我们模型生成的图像，看起来很不错。如果您需要提高模型的准确性，那么我们就必须在大量数据上训练我们的模型。

下一主题Sarimax

Big GAN

Big Gan 的应用

代码

导入库

读取数据集

模型

训练

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

机器学习

监督式学习

分类

杂项

相关教程

面试题

Big GAN

Big Gan 的应用

代码

导入库

读取数据集

模型

训练

相关帖子

机器学习中的随机搜索

统计学的重要性

什么是 Xavier 初始化？

使用 PyCaret 构建机器学习分类模型

机器学习中的文档分类

独立成分分析

GPU 编程的优化技术

机器学习的特征值和特征向量

ACF 和 PCF

什么是多层模型

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器