使用 PyTorch 生成新图像
Generating new images with PyTorch
我正在学习 GAN 我已经完成了一个课程,该课程为我提供了一个根据输入的示例生成图像的程序示例。
示例可在此处找到:
https://github.com/davidsonmizael/gan
所以我决定用它来生成基于面部正面照片数据集的新图像,但我没有取得任何成功。与上面的例子不同的是,代码只产生噪声,而输入有实际图像。
实际上我不知道应该更改什么以使代码指向正确的方向并从图像中学习。我没有更改示例中提供的代码的单个值,但它不起作用。
如果有人能帮助我理解这一点并指出正确的方向,那将非常有帮助。提前致谢。
我的鉴别器:
class D(nn.Module):
def __init__(self):
super(D, self).__init__()
self.main = nn.Sequential(
nn.Conv2d(3, 64, 4, 2, 1, bias = False),
nn.LeakyReLU(0.2, inplace = True),
nn.Conv2d(64, 128, 4, 2, 1, bias = False),
nn.BatchNorm2d(128),
nn.LeakyReLU(0.2, inplace = True),
nn.Conv2d(128, 256, 4, 2, 1, bias = False),
nn.BatchNorm2d(256),
nn.LeakyReLU(0.2, inplace = True),
nn.Conv2d(256, 512, 4, 2, 1, bias = False),
nn.BatchNorm2d(512),
nn.LeakyReLU(0.2, inplace = True),
nn.Conv2d(512, 1, 4, 1, 0, bias = False),
nn.Sigmoid()
)
def forward(self, input):
return self.main(input).view(-1)
我的发电机:
class G(nn.Module):
def __init__(self):
super(G, self).__init__()
self.main = nn.Sequential(
nn.ConvTranspose2d(100, 512, 4, 1, 0, bias = False),
nn.BatchNorm2d(512),
nn.ReLU(True),
nn.ConvTranspose2d(512, 256, 4, 2, 1, bias = False),
nn.BatchNorm2d(256),
nn.ReLU(True),
nn.ConvTranspose2d(256, 128, 4, 2, 1, bias = False),
nn.BatchNorm2d(128),
nn.ReLU(True),
nn.ConvTranspose2d(128, 64, 4, 2, 1, bias = False),
nn.BatchNorm2d(64),
nn.ReLU(True),
nn.ConvTranspose2d(64, 3, 4, 2, 1, bias = False),
nn.Tanh()
)
def forward(self, input):
return self.main(input)
我启动权重的函数:
def weights_init(m):
classname = m.__class__.__name__
if classname.find('Conv') != -1:
m.weight.data.normal_(0.0, 0.02)
elif classname.find('BatchNorm') != -1:
m.weight.data.normal_(1.0, 0.02)
m.bias.data.fill_(0)
完整代码可以在这里看到:
https://github.com/davidsonmizael/criminal-gan
第 25 轮产生的噪声:
输入真实图像:
GAN 训练不是很快。我假设您没有使用预训练模型,而是从头开始学习。在第 25 个时期,在样本中看不到任何有意义的模式是很正常的。
我意识到 github 项目在 25 个时期后向您展示了一些很酷的东西,但这也取决于数据集的大小。
CIFAR-10(在 github 页面上使用的那个)有 60000 张图像。
25 个 epochs 意味着网络已经看到了所有这些 25 次。
我不知道您使用的是哪个数据集,但如果它较小,则可能需要更多的时间才能看到结果,因为网络总共可以看到更少的图像。如果您的数据集中的图像具有更高的分辨率,则可能也需要更长的时间。
你应该至少在几百个 epoch 之后再检查一次,如果不是几千个 epoch。
例如在 25 个 epochs 后的正面照片数据集上:
50 个纪元后:
您的示例 (https://github.com/davidsonmizael/gan) 中的代码给我带来了与您展示的相同的噪音。发电机的损耗下降得太快了。
有一些问题,我什至不确定是什么 - 但我想自己很容易找出差异。为了进行比较,还可以查看本教程:
GANs in 50 lines of PyTorch
.... same as your code
print("# Starting generator and descriminator...")
netG = G()
netG.apply(weights_init)
netD = D()
netD.apply(weights_init)
if torch.cuda.is_available():
netG.cuda()
netD.cuda()
#training the DCGANs
criterion = nn.BCELoss()
optimizerD = optim.Adam(netD.parameters(), lr = 0.0002, betas = (0.5, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr = 0.0002, betas = (0.5, 0.999))
epochs = 25
timeElapsed = []
for epoch in range(epochs):
print("# Starting epoch [%d/%d]..." % (epoch, epochs))
for i, data in enumerate(dataloader, 0):
start = time.time()
time.clock()
#updates the weights of the discriminator nn
netD.zero_grad()
#trains the discriminator with a real image
real, _ = data
if torch.cuda.is_available():
inputs = Variable(real.cuda()).cuda()
target = Variable(torch.ones(inputs.size()[0]).cuda()).cuda()
else:
inputs = Variable(real)
target = Variable(torch.ones(inputs.size()[0]))
output = netD(inputs)
errD_real = criterion(output, target)
errD_real.backward() #retain_graph=True
#trains the discriminator with a fake image
if torch.cuda.is_available():
D_noise = Variable(torch.randn(inputs.size()[0], 100, 1, 1).cuda()).cuda()
target = Variable(torch.zeros(inputs.size()[0]).cuda()).cuda()
else:
D_noise = Variable(torch.randn(inputs.size()[0], 100, 1, 1))
target = Variable(torch.zeros(inputs.size()[0]))
D_fake = netG(D_noise).detach()
D_fake_ouput = netD(D_fake)
errD_fake = criterion(D_fake_ouput, target)
errD_fake.backward()
# NOT:backpropagating the total error
# errD = errD_real + errD_fake
optimizerD.step()
#for i, data in enumerate(dataloader, 0):
#updates the weights of the generator nn
netG.zero_grad()
if torch.cuda.is_available():
G_noise = Variable(torch.randn(inputs.size()[0], 100, 1, 1).cuda()).cuda()
target = Variable(torch.ones(inputs.size()[0]).cuda()).cuda()
else:
G_noise = Variable(torch.randn(inputs.size()[0], 100, 1, 1))
target = Variable(torch.ones(inputs.size()[0]))
fake = netG(G_noise)
G_output = netD(fake)
errG = criterion(G_output, target)
#backpropagating the error
errG.backward()
optimizerG.step()
if i % 50 == 0:
#prints the losses and save the real images and the generated images
print("# Progress: ")
print("[%d/%d][%d/%d] Loss_D: %.4f Loss_G: %.4f" % (epoch, epochs, i, len(dataloader), errD_real.data[0], errG.data[0]))
#calculates the remaining time by taking the avg seconds that every loop
#and multiplying by the loops that still need to run
timeElapsed.append(time.time() - start)
avg_time = (sum(timeElapsed) / float(len(timeElapsed)))
all_dtl = (epoch * len(dataloader)) + i
rem_dtl = (len(dataloader) - i) + ((epochs - epoch) * len(dataloader))
remaining = (all_dtl - rem_dtl) * avg_time
print("# Estimated remaining time: %s" % (time.strftime("%H:%M:%S", time.gmtime(remaining))))
if i % 100 == 0:
vutils.save_image(real, "%s/real_samples.png" % "./results", normalize = True)
vutils.save_image(fake.data, "%s/fake_samples_epoch_%03d.png" % ("./results", epoch), normalize = True)
print ("# Finished.")
CIFAR-10 上 25 个时期(批量大小 256)后的结果:
我正在学习 GAN 我已经完成了一个课程,该课程为我提供了一个根据输入的示例生成图像的程序示例。
示例可在此处找到:
https://github.com/davidsonmizael/gan
所以我决定用它来生成基于面部正面照片数据集的新图像,但我没有取得任何成功。与上面的例子不同的是,代码只产生噪声,而输入有实际图像。
实际上我不知道应该更改什么以使代码指向正确的方向并从图像中学习。我没有更改示例中提供的代码的单个值,但它不起作用。
如果有人能帮助我理解这一点并指出正确的方向,那将非常有帮助。提前致谢。
我的鉴别器:
class D(nn.Module):
def __init__(self):
super(D, self).__init__()
self.main = nn.Sequential(
nn.Conv2d(3, 64, 4, 2, 1, bias = False),
nn.LeakyReLU(0.2, inplace = True),
nn.Conv2d(64, 128, 4, 2, 1, bias = False),
nn.BatchNorm2d(128),
nn.LeakyReLU(0.2, inplace = True),
nn.Conv2d(128, 256, 4, 2, 1, bias = False),
nn.BatchNorm2d(256),
nn.LeakyReLU(0.2, inplace = True),
nn.Conv2d(256, 512, 4, 2, 1, bias = False),
nn.BatchNorm2d(512),
nn.LeakyReLU(0.2, inplace = True),
nn.Conv2d(512, 1, 4, 1, 0, bias = False),
nn.Sigmoid()
)
def forward(self, input):
return self.main(input).view(-1)
我的发电机:
class G(nn.Module):
def __init__(self):
super(G, self).__init__()
self.main = nn.Sequential(
nn.ConvTranspose2d(100, 512, 4, 1, 0, bias = False),
nn.BatchNorm2d(512),
nn.ReLU(True),
nn.ConvTranspose2d(512, 256, 4, 2, 1, bias = False),
nn.BatchNorm2d(256),
nn.ReLU(True),
nn.ConvTranspose2d(256, 128, 4, 2, 1, bias = False),
nn.BatchNorm2d(128),
nn.ReLU(True),
nn.ConvTranspose2d(128, 64, 4, 2, 1, bias = False),
nn.BatchNorm2d(64),
nn.ReLU(True),
nn.ConvTranspose2d(64, 3, 4, 2, 1, bias = False),
nn.Tanh()
)
def forward(self, input):
return self.main(input)
我启动权重的函数:
def weights_init(m):
classname = m.__class__.__name__
if classname.find('Conv') != -1:
m.weight.data.normal_(0.0, 0.02)
elif classname.find('BatchNorm') != -1:
m.weight.data.normal_(1.0, 0.02)
m.bias.data.fill_(0)
完整代码可以在这里看到:
https://github.com/davidsonmizael/criminal-gan
第 25 轮产生的噪声:
输入真实图像:
GAN 训练不是很快。我假设您没有使用预训练模型,而是从头开始学习。在第 25 个时期,在样本中看不到任何有意义的模式是很正常的。 我意识到 github 项目在 25 个时期后向您展示了一些很酷的东西,但这也取决于数据集的大小。 CIFAR-10(在 github 页面上使用的那个)有 60000 张图像。 25 个 epochs 意味着网络已经看到了所有这些 25 次。
我不知道您使用的是哪个数据集,但如果它较小,则可能需要更多的时间才能看到结果,因为网络总共可以看到更少的图像。如果您的数据集中的图像具有更高的分辨率,则可能也需要更长的时间。
你应该至少在几百个 epoch 之后再检查一次,如果不是几千个 epoch。
例如在 25 个 epochs 后的正面照片数据集上:
50 个纪元后:
您的示例 (https://github.com/davidsonmizael/gan) 中的代码给我带来了与您展示的相同的噪音。发电机的损耗下降得太快了。
有一些问题,我什至不确定是什么 - 但我想自己很容易找出差异。为了进行比较,还可以查看本教程: GANs in 50 lines of PyTorch
.... same as your code
print("# Starting generator and descriminator...")
netG = G()
netG.apply(weights_init)
netD = D()
netD.apply(weights_init)
if torch.cuda.is_available():
netG.cuda()
netD.cuda()
#training the DCGANs
criterion = nn.BCELoss()
optimizerD = optim.Adam(netD.parameters(), lr = 0.0002, betas = (0.5, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr = 0.0002, betas = (0.5, 0.999))
epochs = 25
timeElapsed = []
for epoch in range(epochs):
print("# Starting epoch [%d/%d]..." % (epoch, epochs))
for i, data in enumerate(dataloader, 0):
start = time.time()
time.clock()
#updates the weights of the discriminator nn
netD.zero_grad()
#trains the discriminator with a real image
real, _ = data
if torch.cuda.is_available():
inputs = Variable(real.cuda()).cuda()
target = Variable(torch.ones(inputs.size()[0]).cuda()).cuda()
else:
inputs = Variable(real)
target = Variable(torch.ones(inputs.size()[0]))
output = netD(inputs)
errD_real = criterion(output, target)
errD_real.backward() #retain_graph=True
#trains the discriminator with a fake image
if torch.cuda.is_available():
D_noise = Variable(torch.randn(inputs.size()[0], 100, 1, 1).cuda()).cuda()
target = Variable(torch.zeros(inputs.size()[0]).cuda()).cuda()
else:
D_noise = Variable(torch.randn(inputs.size()[0], 100, 1, 1))
target = Variable(torch.zeros(inputs.size()[0]))
D_fake = netG(D_noise).detach()
D_fake_ouput = netD(D_fake)
errD_fake = criterion(D_fake_ouput, target)
errD_fake.backward()
# NOT:backpropagating the total error
# errD = errD_real + errD_fake
optimizerD.step()
#for i, data in enumerate(dataloader, 0):
#updates the weights of the generator nn
netG.zero_grad()
if torch.cuda.is_available():
G_noise = Variable(torch.randn(inputs.size()[0], 100, 1, 1).cuda()).cuda()
target = Variable(torch.ones(inputs.size()[0]).cuda()).cuda()
else:
G_noise = Variable(torch.randn(inputs.size()[0], 100, 1, 1))
target = Variable(torch.ones(inputs.size()[0]))
fake = netG(G_noise)
G_output = netD(fake)
errG = criterion(G_output, target)
#backpropagating the error
errG.backward()
optimizerG.step()
if i % 50 == 0:
#prints the losses and save the real images and the generated images
print("# Progress: ")
print("[%d/%d][%d/%d] Loss_D: %.4f Loss_G: %.4f" % (epoch, epochs, i, len(dataloader), errD_real.data[0], errG.data[0]))
#calculates the remaining time by taking the avg seconds that every loop
#and multiplying by the loops that still need to run
timeElapsed.append(time.time() - start)
avg_time = (sum(timeElapsed) / float(len(timeElapsed)))
all_dtl = (epoch * len(dataloader)) + i
rem_dtl = (len(dataloader) - i) + ((epochs - epoch) * len(dataloader))
remaining = (all_dtl - rem_dtl) * avg_time
print("# Estimated remaining time: %s" % (time.strftime("%H:%M:%S", time.gmtime(remaining))))
if i % 100 == 0:
vutils.save_image(real, "%s/real_samples.png" % "./results", normalize = True)
vutils.save_image(fake.data, "%s/fake_samples_epoch_%03d.png" % ("./results", epoch), normalize = True)
print ("# Finished.")
CIFAR-10 上 25 个时期(批量大小 256)后的结果: