Variational Autoencoder(VAE) can do many amazing
things if we increase the latent space dimensionality from 2D to
multi-dimensional space for generating faces.
In the previous tutorial, we have learned about building the VAE and trained with MNIST handwritten digits dataset and also done
analysis with testing data. Go through it once.
Welcome to aiRobott, I am Kishor Kumar Vajja. In this tutorial we will learn how to generate celebrity faces using VAE with Keras, and we will
|
we will use the CelebFaces Attributes dataset (CelebA) to train our variational autoencoder. It is a large-scale face attributes dataset
with more than 200K celebrity images, each with 40 attribute annotations. The images
in this dataset cover large pose variations and background clutter.
CelebA has large diversities, large quantities, and rich annotations,
including
- 10,177 number of identities,
- 2,02,599 number of face
images, and
- 5 landmark
locations, 40
binary attributes annotations per image.
The dataset can be
employed as the training and test sets for the computer
vision tasks, such as
- face attribute recognition,
- face detection,
- landmark (or facial part) localization, and
- face editing & synthesis.
For
training the VAE, we don’t need the labels, but these will be useful later when
we start exploring how these features are captured in the multidimensional
latent space. Once our VAE is trained, we can sample from the latent space to
generate new examples of celebrity faces.
For training
the VAE, some improvements to be done to two modules: they are -
VAE.py and loaders.py. it will be shown in the coding part.
The
network architecture for the faces model is similar to the Handwritten digits
example, but a few improvements to be done: They are -
1.
here we are using coloured images, so our data now has three input
channels (RGB) instead of one (i.e, grayscale). This means we need to change
the number of channels in the final convolutional transpose
layer of the decoder to 3.
2.
Since faces are much more complex than digits, so we have to increase the
dimensionality of the latent space so that the network can encode a
satisfactory amount of detail from the images. Therefore, we have to use
a latent space with 200 dimensions instead of 2.
3.To speed-up the training process, batch normalization layers has to
be used after each convolution layer. Even though each batch takes a longer
time to run, the number of batches required to reach the same loss is greatly
reduced. And dropout layers are also to be used.
4. We increase the reconstruction
loss factor to 10,000. This is a parameter that requires tuning; for this
dataset and architecture this value was found to generate good results.
5. We use a generator to
feed images to the VAE from a folder, rather than loading all the images into
memory up front. Since the VAE trains in batches, there is no need to load all
the images into memory first, so instead we use the built-in fit_generator method that Keras provides to read in images only when they are required
for training.
This is the full
architecture of the encoder, which we are going to build-
Here I have
used: Input
layer –
01, Convolutional layers –
04,Batch normalization layers –
04, LeakyReLU –
04, Dropout –
04, flatten –
01, mu –
01, log_var –
01,lambda layer –
01.input images shape given is 128x128 of 3 channels, output shape is 200.
This is the full architecture of the decoder:
here Input
layer is the output of lambda layer, dense layer – 01, reshape layer – 01,
convolutional transpose layers – 04,
batch norm –
03, leaky relu –
03, dropout –
03, activation fn.(i.e, sigmoid) - 01
In
the previous tutorial, We have created 3 modules, they are
-
VAE.py, loaders.py
– we will add some code to these two
modules only.
and callbacks.py
for VAE.py module we need to add batch
normalization and dropout layers:
open
the jupyter notebook- now open the VAE.py module,
it is previously saved in the models folder. Now we will add required code this module. (Video link)
#
importing required packages
import numpy as np
import os
import pickle
import json
# Importing required packages from
keras
from keras.layers import Input, Conv2D,
Flatten, Dense, Conv2DTranspose, Reshape, Lambda, Activation, LeakyReLU, BatchNormalization,
Dropout
from keras.models import Model
from keras import backend as K
from keras.optimizers import Adam
from keras.callbacks import
ModelCheckpoint
from keras.utils import plot_model
# this is callbacks module to be
created next
from models.callbacks import
CustomCallback, step_decay_schedule
class VariationalAutoencoder():
def __init__(self
, input_dim
, encoder_conv_filters
, encoder_conv_kernel_size
, encoder_conv_strides
, decoder_conv_t_filters
, decoder_conv_t_kernel_size
, decoder_conv_t_strides
, z_dim
, use_batch_norm = False
, use_dropout= False
):
self.name = 'variational_autoencoder'
self.input_dim = input_dim
self.encoder_conv_filters = encoder_conv_filters
self.encoder_conv_kernel_size = encoder_conv_kernel_size
self.encoder_conv_strides = encoder_conv_strides
self.decoder_conv_t_filters = decoder_conv_t_filters
self.decoder_conv_t_kernel_size = decoder_conv_t_kernel_size
self.decoder_conv_t_strides = decoder_conv_t_strides
self.z_dim = z_dim
self.use_batch_norm
= use_batch_norm
self.use_dropout = use_dropout
self.n_layers_encoder = len(encoder_conv_filters)
self.n_layers_decoder = len(decoder_conv_t_filters)
self._build()
def _build(self):
### THE ENCODER
encoder_input = Input(shape=self.input_dim, name='encoder_input')
x = encoder_input
for i in range(self.n_layers_encoder):
conv_layer = Conv2D(
filters =
self.encoder_conv_filters[i]
, kernel_size =
self.encoder_conv_kernel_size[i]
, strides = self.encoder_conv_strides[i]
, padding = 'same'
, name = 'encoder_conv_' + str(i)
)
x = conv_layer(x)
if self.use_batch_norm:
x = BatchNormalization()(x)
x = LeakyReLU()(x)
if self.use_dropout:
x = Dropout(rate = 0.25)(x)
shape_before_flattening = K.int_shape(x)[1:]
x = Flatten()(x)
self.mu = Dense(self.z_dim, name='mu')(x)
self.log_var = Dense(self.z_dim, name='log_var')(x)
self.encoder_mu_log_var = Model(encoder_input, (self.mu,
self.log_var))
def sampling(args):
mu, log_var = args
epsilon =
K.random_normal(shape=K.shape(mu), mean=0., stddev=1.)
return mu + K.exp(log_var / 2) *
epsilon
encoder_output = Lambda(sampling, name='encoder_output')([self.mu,
self.log_var])
self.encoder = Model(encoder_input, encoder_output)
### THE DECODER
decoder_input = Input(shape=(self.z_dim,), name='decoder_input')
x = Dense(np.prod(shape_before_flattening))(decoder_input)
x = Reshape(shape_before_flattening)(x)
for i in range(self.n_layers_decoder):
conv_t_layer = Conv2DTranspose(
filters =
self.decoder_conv_t_filters[i]
, kernel_size =
self.decoder_conv_t_kernel_size[i]
, strides =
self.decoder_conv_t_strides[i]
, padding = 'same'
, name = 'decoder_conv_t_' +
str(i)
)
x = conv_t_layer(x)
if i <
self.n_layers_decoder - 1:
if self.use_batch_norm:
x =
BatchNormalization()(x)
x = LeakyReLU()(x)
if self.use_dropout:
x = Dropout(rate =
0.25)(x)
else:
x =
Activation('sigmoid')(x)
decoder_output = x
self.decoder = Model(decoder_input, decoder_output)
### THE FULL VAE
model_input = encoder_input
model_output = self.decoder(encoder_output)
self.model = Model(model_input, model_output)
def compile(self, learning_rate, r_loss_factor):
self.learning_rate = learning_rate
### COMPILATION
def vae_r_loss(y_true, y_pred):
r_loss = K.mean(K.square(y_true -
y_pred), axis = [1,2,3])
return r_loss_factor * r_loss
def vae_kl_loss(y_true, y_pred):
kl_loss = -0.5 * K.sum(1 + self.log_var -
K.square(self.mu) - K.exp(self.log_var), axis = 1)
return kl_loss
def vae_loss(y_true, y_pred):
r_loss = vae_r_loss(y_true,
y_pred)
kl_loss = vae_kl_loss(y_true,
y_pred)
return r_loss + kl_loss
optimizer = Adam(lr=learning_rate)
self.model.compile(optimizer=optimizer, loss = vae_loss, metrics = [vae_r_loss, vae_kl_loss])
def save(self, folder):
if not os.path.exists(folder):
os.makedirs(folder)
os.makedirs(os.path.join(folder,
'viz'))
os.makedirs(os.path.join(folder,
'weights'))
os.makedirs(os.path.join(folder,
'images'))
with open(os.path.join(folder, 'params.pkl'), 'wb') as f:
pickle.dump([
self.input_dim
,
self.encoder_conv_filters
,
self.encoder_conv_kernel_size
, self.encoder_conv_strides
, self.decoder_conv_t_filters
,
self.decoder_conv_t_kernel_size
, self.decoder_conv_t_strides
, self.z_dim
, self.use_batch_norm
, self.use_dropout
], f)
self.plot_model(folder)
def load_weights(self, filepath):
self.model.load_weights(filepath)
def train(self, x_train, batch_size, epochs, run_folder,
print_every_n_batches = 100, initial_epoch = 0, lr_decay = 1):
custom_callback = CustomCallback(run_folder, print_every_n_batches,
initial_epoch, self)
lr_sched = step_decay_schedule(initial_lr=self.learning_rate,
decay_factor=lr_decay, step_size=1)
checkpoint_filepath=os.path.join(run_folder,
"weights/weights-{epoch:03d}-{loss:.2f}.h5")
checkpoint1 = ModelCheckpoint(checkpoint_filepath, save_weights_only =
True, verbose=1)
checkpoint2 =
ModelCheckpoint(os.path.join(run_folder, 'weights/weights.h5'),
save_weights_only = True, verbose=1)
callbacks_list = [checkpoint1, checkpoint2, custom_callback, lr_sched]
self.model.fit(
x_train
, x_train
, batch_size = batch_size
, shuffle = True
, epochs = epochs
, initial_epoch = initial_epoch
, callbacks = callbacks_list
)
def train_with_generator(self, data_flow, epochs, steps_per_epoch,
run_folder, print_every_n_batches = 100, initial_epoch = 0, lr_decay= 1, ):
custom_callback = CustomCallback(run_folder, print_every_n_batches,
initial_epoch, self)
lr_sched = step_decay_schedule(initial_lr=self.learning_rate,
decay_factor=lr_decay, step_size=1)
checkpoint_filepath=os.path.join(run_folder,
"weights/weights-{epoch:03d}-{loss:.2f}.h5")
checkpoint1 = ModelCheckpoint(checkpoint_filepath, save_weights_only =
True, verbose=1)
checkpoint2 =
ModelCheckpoint(os.path.join(run_folder, 'weights/weights.h5'),
save_weights_only = True, verbose=1)
callbacks_list = [checkpoint1, checkpoint2, custom_callback, lr_sched]
self.model.save_weights(os.path.join(run_folder,
'weights/weights.h5'))
self.model.fit_generator(
data_flow
, shuffle = True
, epochs = epochs
, initial_epoch = initial_epoch
, callbacks = callbacks_list
, steps_per_epoch=steps_per_epoch
)
def plot_model(self, run_folder):
plot_model(self.model, to_file=os.path.join(run_folder
,'viz/model.png'), show_shapes = True, show_layer_names = True)
plot_model(self.encoder, to_file=os.path.join(run_folder
,'viz/encoder.png'), show_shapes = True, show_layer_names = True)
plot_model(self.decoder, to_file=os.path.join(run_folder
,'viz/decoder.png'), show_shapes = True, show_layer_names = True)
|
There
is no need to edit the callbacks.py module. Let’s
keep it as it is.
Now
we will edit the loaders.py module like this – The required packages to be add at the
beginning of this module - (Video link)
import
numpy as np
import pickle
import os
import h5py
# lets import pandas, it is a
fast, powerful, flexible and easy to use open source data analysis and
manipulation tool, bu ilt on top of the Python programming
language.
import
pandas as pd
# SciPy is a Python-based ecosystem of open-source
software for mathematics, science, and engineering.
import
scipy
# glob Uses Unix shell rules to fine filenames matching a pattern.
from
glob import glob
#Python method walk() generates
the file names in a directory tree by walking the tree either top-down or
bottom-up.
# Python method getcwd() returns
current working directory of a process.
from
os import walk, getcwd
# The module pdb is an interactive source code debugger for Python programs. It
supports setting (conditional) breakpoints and single stepping at the source
line level, inspection of stack frames, source code listing, and evaluation
of arbitrary Python code in the context of any stack frame. It also supports
post-mortem debugging and can be called under program control.
import
pdb
from keras.datasets import mnist
from
keras.preprocessing.image import ImageDataGenerator, load_img, save_img
# or, use this...
from
keras_preprocessing.image import ImageDataGenerator, load_img, img_to_array
# If you want the Keras modules you write to be compatible with
both Theano (th) and TensorFlow (tf), you have to write them via the abstract Keras backend API. You can import the backend module via:
from keras import backend as K
from
keras.utils import to_categorical # Converts a class
vector (integers) to binary class matrix.
class ImageLabelLoader():
def __init__(self, image_folder, target_size):
self.image_folder = image_folder
self.target_size = target_size
def build(self, att, batch_size, label = None):
data_gen = ImageDataGenerator(rescale=1./255)
if label:
data_flow =
data_gen.flow_from_dataframe(
att
, self.image_folder
, x_col='image_id'
, y_col=label
,
target_size=self.target_size
, class_mode='other'
, batch_size=batch_size
, shuffle=True
)
else:
data_flow =
data_gen.flow_from_dataframe(
att
, self.image_folder
, x_col='image_id'
,
target_size=self.target_size
, class_mode='input'
, batch_size=batch_size
, shuffle=True
)
return data_flow
class
DataLoader():
def __init__(self, dataset_name,
img_res=(256, 256)):
self.dataset_name = dataset_name
self.img_res = img_res
def load_data(self, domain, batch_size=1,
is_testing=False):
data_type = "train%s" %
domain if not is_testing else "test%s" % domain
path = glob('./data/%s/%s/*' %
(self.dataset_name, data_type))
batch_images = np.random.choice(path,
size=batch_size)
imgs = []
for img_path in batch_images:
img = self.imread(img_path)
if not is_testing:
img =
scipy.misc.imresize(img, self.img_res)
if np.random.random() >
0.5:
img = np.fliplr(img)
else:
img =
scipy.misc.imresize(img, self.img_res)
imgs.append(img)
imgs = np.array(imgs)/127.5 - 1.
return imgs
def load_batch(self, batch_size=1,
is_testing=False):
data_type = "train" if not
is_testing else "val"
path_A = glob('./data/%s/%sA/*' %
(self.dataset_name, data_type))
path_B = glob('./data/%s/%sB/*' %
(self.dataset_name, data_type))
self.n_batches = int(min(len(path_A),
len(path_B)) / batch_size)
total_samples = self.n_batches *
batch_size
# Sample n_batches * batch_size from
each path list so that model sees all
# samples from both domains
path_A = np.random.choice(path_A,
total_samples, replace=False)
path_B = np.random.choice(path_B,
total_samples, replace=False)
for i in range(self.n_batches-1):
batch_A = path_A[i*batch_size:(i+1)*batch_size]
batch_B =
path_B[i*batch_size:(i+1)*batch_size]
imgs_A, imgs_B = [], []
for img_A, img_B in zip(batch_A,
batch_B):
img_A = self.imread(img_A)
img_B = self.imread(img_B)
img_A =
scipy.misc.imresize(img_A, self.img_res)
img_B =
scipy.misc.imresize(img_B, self.img_res)
if not is_testing and
np.random.random() > 0.5:
img_A = np.fliplr(img_A)
img_B =
np.fliplr(img_B)
imgs_A.append(img_A)
imgs_B.append(img_B)
imgs_A = np.array(imgs_A)/127.5 -
1.
imgs_B = np.array(imgs_B)/127.5 -
1.
yield imgs_A, imgs_B
def load_img(self, path):
img = self.imread(path)
img = scipy.misc.imresize(img,
self.img_res)
img = img/127.5 - 1.
return img[np.newaxis, :, :, :]
def imread(self, path):
return scipy.misc.imread(path,
mode='RGB').astype(np.float)
def load_model(model_class, folder):
with open(os.path.join(folder, 'params.pkl'), 'rb') as f:
params = pickle.load(f)
model = model_class(*params)
model.load_weights(os.path.join(folder, 'weights/weights.h5'))
return model
# this is for loading mnist
handwritten dataset, but it is not required here
def
load_mnist():
(x_train, y_train), (x_test, y_test) =
mnist.load_data()
x_train = x_train.astype('float32') /
255.
x_train = x_train.reshape(x_train.shape +
(1,))
x_test = x_test.astype('float32') / 255.
x_test = x_test.reshape(x_test.shape +
(1,))
return (x_train, y_train), (x_test,
y_test)
# loading images from data folder as training images
def
load_celeb(data_name, image_size, batch_size):
data_folder =
os.path.join("./data", data_name)
data_gen =
ImageDataGenerator(preprocessing_function=lambda x: (x.astype('float32') -
127.5) / 127.5)
x_train = data_gen.flow_from_directory(data_folder
,
target_size = (image_size,image_size)
,
batch_size = batch_size
,
shuffle = True
,
class_mode = 'input'
,
subset = "training"
)
return x_train
|
Now
we will start training our VAE: let’s open jupyter notebook, and i have named as VAE_FACES-TRAINING.ipynb , Let us first build VAE model and train with celebA dataset:
VAE Training -
Faces dataset
imports
# importing the required
packages and our VAE.py module
import os
from glob import glob
import numpy as np
from models.VAE import
VariationalAutoencoder
from keras.preprocessing.image import
ImageDataGenerator
(or)
#from
keras_preprocessing.image import ImageDataGenerator
# at this place any one can
gives same result, so you can use any one.
|
# run params, this produces
folders – run/vae/001_faces/images, viz, weights, like this
section = 'vae'
run_id = '001'
data_name = 'faces'
RUN_FOLDER = 'run/{}/'.format(section)
RUN_FOLDER += '_'.join([run_id,
data_name])
if not os.path.exists(RUN_FOLDER):
os.makedirs(RUN_FOLDER)
os.mkdir(os.path.join(RUN_FOLDER, 'viz'))
os.mkdir(os.path.join(RUN_FOLDER, 'images'))
os.mkdir(os.path.join(RUN_FOLDER, 'weights'))
mode
= 'build'
# download celebA dataset and list_attr_celeba.csv.zip file from this #link https://www.kaggle.com/jessicali9530/celeba-dataset to the data folder, it takes time to download,
pls have patience. #After downloading unzip the file in the data folder.
DATA_FOLDER = './data/celeb/'
|
Data
INPUT_DIM = (128,128,3)
BATCH_SIZE = 32 # images for a
batch
# adding images path
filenames =
np.array(glob(os.path.join(DATA_FOLDER, '*/*.jpg')))
NUM_IMAGES = len(filenames)
print("Total Images: ",
NUM_IMAGES)
print("Shape of Images: ",
INPUT_DIM[:2])
print("Data Folder Path: ",
DATA_FOLDER)
|
Data Flow
# scaling the images and save into data_flow for training the
network
data_gen =
ImageDataGenerator(rescale=1./255)
data_flow = data_gen.flow_from_directory(DATA_FOLDER
,
target_size = INPUT_DIM[:2]
,
batch_size = BATCH_SIZE
,
shuffle = True
, class_mode = 'input'
,
subset = "training"
)
|
Architecture
# building the model and save in the run folder
vae = VariationalAutoencoder(
input_dim = INPUT_DIM
,
encoder_conv_filters=[32,64,64, 64]
,
encoder_conv_kernel_size=[3,3,3,3]
, encoder_conv_strides=[2,2,2,2]
,
decoder_conv_t_filters=[64,64,32,3]
,
decoder_conv_t_kernel_size=[3,3,3,3]
,
decoder_conv_t_strides=[2,2,2,2]
, z_dim=200
, use_batch_norm=True
, use_dropout=True)
Vae.save(RUN_FOLDER)
if
mode == 'build':
vae.save(RUN_FOLDER)
else:
vae.load_weights(os.path.join(RUN_FOLDER,
'weights/weights.h5'))
|
the models - encoder, decoder and model are saved in viz folder-
vae.encoder.summary() #and
you can also see the summary of encoder here also using this code.
vae.decoder.summary() # in the similar way, decoder
also
|
Compilation & Training
# parameters
LEARNING_RATE = 0.0005
R_LOSS_FACTOR = 10000 # This parameter required
for tuning the network.
EPOCHS = 5 # 200 # here I am giving 1, but best results
take 200 epochs
PRINT_EVERY_N_BATCHES = 100 # new image
will produce from 100 batches of #input images
INITIAL_EPOCH = 0
|
# now compile
the model
vae.compile(LEARNING_RATE,
R_LOSS_FACTOR)
|
# Let’s start
the training our VAE
vae.train_with_generator(
data_flow
, epochs = EPOCHS
, steps_per_epoch = NUM_IMAGES / BATCH_SIZE
, run_folder = RUN_FOLDER
, print_every_n_batches = PRINT_EVERY_N_BATCHES
, initial_epoch = INITIAL_EPOCH
)
|
Now first epoch is running with 6331 batches of
images, in the images folder it’s output images are saving for every 100
batches of images it produces one new image. The first image img_001_0.jpg
saved at first, after 100 batches new image img_001_100.jpg saved, similarly
for every 100 batches one new image produced, this continues upto the end of 2nd
epoch.
Observe the output: I have run only 2 epochs, but it
is good for run upto 200 epochs for best training our VAE. Here our dataset has
2,02,599 images turned into batches, for every batch 32 images passed, so there
is 6331 batches. For every 100 batches of images, our VAE builds one new image,
so total 64 new images build for 1 epoch, for 2 epochs total 128 new images. All
these images saved in the folder of images.
Now our VAE is trained, Let us now analyse the VAE:
Let’s open notebook, i have named as
‘VAE_FACES-ANALYSIS’
VAE Analysis – Faces dataset
# importing the required
packages, loading the celeb dataset with labels and model.
Imports
import
numpy as np
import matplotlib.pyplot as plt
import numpy as np
import os
from scipy.stats import norm
import pandas as pd
from models.VAE import
VariationalAutoencoder
from models.loaders import load_model,
ImageLabelLoader
|
# data folder and image folder
paths
#
run params,
section = 'vae'
run_id = '001'
data_name = 'faces'
RUN_FOLDER = 'run/{}/'.format(section)
RUN_FOLDER += '_'.join([run_id,
data_name])
DATA_FOLDER = './data/celeb/'
IMAGE_FOLDER =
'./data/celeb/img_align_celeba/'
|
Data
INPUT_DIM
= (128, 128, 3)
# reading the list of
celebrity attributes from data folder.
att =
pd.read_csv(os.path.join(DATA_FOLDER, 'list_attr_celeba.csv'))
# loading images with labels
imageLoader =
ImageLabelLoader(IMAGE_FOLDER, INPUT_DIM[:2])
|
# lets see all the attributes of images, here 40
attributes for all the images
print(att)
|
# if you want to see top 5 attributes, use Pandas
head() method.
att.head()
|
Architecture
# now we will build the architecture from run
folder, by using parameters and weights
vae =
load_model(VariationalAutoencoder, RUN_FOLDER)
|
Reconstructing Faces
Now we will reconstruct the original faces from VAE, Let’s
pass 10 sample images with labels to VAE and we will get reconstructed images.
n_to_show = 10
data_flow_generic = imageLoader.build(att,
n_to_show)
example_batch = next(data_flow_generic)
example_images = example_batch[0]
z_points = vae.encoder.predict(example_images)
reconst_images = vae.decoder.predict(z_points)
fig = plt.figure(figsize=(15, 3))
fig.subplots_adjust(hspace=0.4, wspace=0.4)
for i in range(n_to_show):
img =
example_images[i].squeeze()
sub =
fig.add_subplot(2, n_to_show, i+1)
sub.axis('off')
sub.imshow(img)
for i in range(n_to_show):
img =
reconst_images[i].squeeze()
sub =
fig.add_subplot(2, n_to_show, i+n_to_show+1)
sub.axis('off')
sub.imshow(img)
|
First it will validate all the
images in the data folder, from that randomly selects 10 images for passing
through VAE, it’s output is our required images.
The top row shows the original images and the bottom row shows the reconstructions
once they have passed through the encoder and decoder.
Figure : Reconstructed faces, after passing through the encoder and decoder
Figure : Reconstructed faces, after passing through the encoder and decoder
We can see that the VAE has
successfully captured the key features of each face—the angle of the head, the
hairstyle, the expression, etc. Some of the fine detail is missing, but it is
important to remember that the aim of building variational autoencoders isn’t
to achieve perfect reconstruction loss. Our end goal is to sample from the
latent space in order to generate new faces. So this is our required output.
Latent Space distribution
Here Latent space is a
multivariate standard normal distribution, because 200 dimensions used in the
latent space. Actually we cannot view all dimensions simultaneously, instead we
can check the distribution of each latent dimension individually.
Here is the code for Latent
space distribution-
z_test
= vae.encoder.predict_generator(data_flow_generic,
steps = 20, verbose = 1)
x = np.linspace(-3, 3, 100)
fig = plt.figure(figsize=(20, 20))
fig.subplots_adjust(hspace=0.6,
wspace=0.4)
for i in range(50):
ax = fig.add_subplot(5, 10, i+1)
ax.hist(z_test[:,i], density=True, bins = 20)
ax.axis('off')
ax.text(0.5, -0.35, str(i), fontsize=10, ha='center', transform=ax.transAxes)
ax.plot(x,norm.pdf(x))
plt.show()
|
The first 50 dimensions in our
latent space are shown in this output (in the below Figure 3). There aren’t any distributions that stand out as being significantly
different from the standard normal, If we see that any dimensions that are significantly
different from a standard normal distribution, we should reduce the
reconstruction loss factor, since the KL divergence term isn’t having enough
effect in this case. so we can move on to generating some
new faces.
Figure 3 : Distributions of points for the first 50 dimensions in the latent space
Figure 3 : Distributions of points for the first 50 dimensions in the latent space
Generating New Faces from the
latent space.
n_to_show = 30
znew = np.random.normal(size = (n_to_show, vae.z_dim)) # 1
reconst = vae.decoder.predict(np.array(znew)) # 2
fig = plt.figure(figsize=(18, 5))
fig.subplots_adjust(hspace=0.4, wspace=0.4)
for i in range(n_to_show):
ax =
fig.add_subplot(3, 10, i+1)
ax.imshow(reconst[i, :,:,:]) # 3
ax.axis('off')
plt.show()
|
1. We sample 30
points from a standard normal distribution with 200 dimensions…
2. …then pass these points to the decoder.
3. The resulting
output is a 128 × 128 × 3 image that we can view here under- This is the output (is shown in Figure 4)
Figure 4: New generated faces
Amazingly, the VAE is able to
take the set of points that we sampled and convert each into a convincing image
of a person’s face. While the images are not perfect, they are a giant leap
forward from the Naive Bayes model
The Naive Bayes model faced the
problem of not being able to capture dependency between adjacent pixels, since
it had no notion of higher-level features such as sunglasses or brown hair.
The VAE doesn’t suffer from
this problem, since the convolutional layers of the encoder are designed to
translate low-level pixels into high-level features and the decoder is trained
to perform the opposite task of translating the high-level features in the
latent space back to raw pixels.
Latent Space Arithmetic
Lets see some theory-
One benefit of mapping images into a lower-dimensional space is
that we can perform arithmetic on vectors in this latent space that has a
visual analogue when decoded
back into the original image domain.
For example, suppose we want to take an image of
somebody who looks sad and give them
a smile. To do this we first need to find a vector in the latent space that
points in the direction of increasing smile, i.e, Smile increasing vector. Adding this vector to the encoding of the original image in the
latent space will give us a new point which, when it was decoded, it should
give us a more smiley version of the original image. So
how can we find the smile vector? As we know that- each image in the
CelebA dataset is labelled with attributes, there is one attribute of which
is smiling. If we
take the average position of encoded images in the latent space with the
attribute smiling and subtract
the average position of encoded images that do not have the attribute smiling, and adding the average position of encoded
images that is having attribute smiling, we will obtain the
vector that points from not smiling
to smiling, which is exactly what we need.
Conceptually, we are performing vector arithmetic in the latent space,
z_new = z + alpha * feature_vector
where alpha is a factor that determines how much of the feature vector is added or subtracted:
z_new = z + alpha * feature_vector
where alpha is a factor that determines how much of the feature vector is added or subtracted:
Let’s see this in action. (Figure 5 shows)
all the images that have been encoded into the latent space. We then add or
subtract multiples of a certain vector (e.g., smile, blonde, male, eyeglasses) to
obtain different versions of the image, with only the relevant feature changed.
Figure 5: Adding and subtracting features to and from faces
Figure 5: Adding and subtracting features to and from faces
It is quite remarkable that - even though we are moving the point a significantly large distance in the latent space, the core image barely changes, except for the one feature that we want to manipulate. This demonstrates the power of variational autoencoders for capturing and adjusting high-level features in images.
Let us now build functions for feature
vectors -
Functions for getting vectors
for corresponding label
Now let us build a function for
feature vectors from labels-
def
get_vector_from_label(label, batch_size):
data_flow_label = imageLoader.build(att, batch_size, label =
label)
# initiating vectors
origin = np.zeros(shape = vae.z_dim, dtype = 'float32')
current_sum_POS = np.zeros(shape = vae.z_dim, dtype = 'float32')
current_n_POS = 0
current_mean_POS = np.zeros(shape = vae.z_dim, dtype = 'float32')
current_sum_NEG = np.zeros(shape = vae.z_dim, dtype = 'float32')
current_n_NEG = 0
current_mean_NEG = np.zeros(shape = vae.z_dim, dtype = 'float32')
current_vector = np.zeros(shape = vae.z_dim, dtype = 'float32')
current_dist = 0
print('label: ' + label)
print('images : POS move : NEG move :distance : 𝛥 distance')
while(current_n_POS < 10000):
batch = next(data_flow_label)
im = batch[0]
attribute = batch[1]
z = vae.encoder.predict(np.array(im))
z_POS = z[attribute==1]
z_NEG = z[attribute==-1]
# this is for adding attributes
if len(z_POS) > 0:
current_sum_POS = current_sum_POS
+ np.sum(z_POS, axis = 0)
current_n_POS += len(z_POS)
new_mean_POS = current_sum_POS /
current_n_POS
movement_POS =
np.linalg.norm(new_mean_POS-current_mean_POS)
# for subtracting attributes
if len(z_NEG) > 0:
current_sum_NEG = current_sum_NEG
+ np.sum(z_NEG, axis = 0)
current_n_NEG +=
len(z_NEG)
new_mean_NEG = current_sum_NEG /
current_n_NEG
movement_NEG = np.linalg.norm(new_mean_NEG-current_mean_NEG)
current_vector = new_mean_POS-new_mean_NEG
new_dist = np.linalg.norm(current_vector)
dist_change = new_dist - current_dist
print(str(current_n_POS)
+ '
: ' + str(np.round(movement_POS,3))
+ ' : ' + str(np.round(movement_NEG,3))
+ ' : ' + str(np.round(new_dist,3))
+ ' : ' + str(np.round(dist_change,3))
)
current_mean_POS = np.copy(new_mean_POS)
current_mean_NEG = np.copy(new_mean_NEG)
current_dist = np.copy(new_dist)
if np.sum([movement_POS, movement_NEG]) < 0.08:
current_vector = current_vector /
current_dist
print('Found the ' + label + '
vector')
break
return current_vector
|
Now we will build a function
for adding the vectors to images
def
add_vector_to_images(feature_vec):
n_to_show
= 5
factors =
[-4,-3,-2,-1,0,1,2,3,4]
example_batch = next(data_flow_generic)
example_images = example_batch[0]
example_labels = example_batch[1]
z_points
= vae.encoder.predict(example_images)
fig =
plt.figure(figsize=(18, 10))
counter =
1
for i in
range(n_to_show):
img =
example_images[i].squeeze()
sub =
fig.add_subplot(n_to_show, len(factors) + 1, counter)
sub.axis('off')
sub.imshow(img)
counter += 1
for
factor in factors:
changed_z_point = z_points[i] + feature_vec * factor
changed_image = vae.decoder.predict(np.array([changed_z_point]))[0]
img = changed_image.squeeze()
sub = fig.add_subplot(n_to_show, len(factors) + 1, counter)
sub.axis('off')
sub.imshow(img)
counter += 1
plt.show()
|
Now we will generate the
feature vectors from the images-
# Respective attribute vectors
BATCH_SIZE = 500
# Here we are
generating 8 feature vectors from labels
attractive_vec =
get_vector_from_label('Attractive', BATCH_SIZE)
mouth_open_vec = get_vector_from_label('Mouth_Slightly_Open',
BATCH_SIZE)
smiling_vec = get_vector_from_label('Smiling',
BATCH_SIZE)
lipstick_vec =
get_vector_from_label('Wearing_Lipstick', BATCH_SIZE)
young_vec =
get_vector_from_label('High_Cheekbones', BATCH_SIZE)
male_vec = get_vector_from_label('Male',
BATCH_SIZE)
eyeglasses_vec = get_vector_from_label('Eyeglasses',
BATCH_SIZE)
blonde_vec = get_vector_from_label('Blond_Hair',
BATCH_SIZE)
|
This
will take some time, wait for until the completion.
Now
all the feature vectors have generated, Lets start- we will take 5 images for
applying the feature vectors.
Now
we will add attractive feature vector to labelled images -
print('Attractive
Vector')
add_vector_to_images(attractive_vec)
|
Similarly
mouth open vector finds the mouth in the image and transforms into open mouth
print('Mouth
Open Vector')
add_vector_to_images(mouth_open_vec)
|
Similarly
smiling vector adds smile to the face
print('Smiling
Vector')
add_vector_to_images(smiling_vec)
|
Lipstick vector adds lipstick to the faces, not perfectly done but adding the features from the latent space.
print('Lipstick
Vector')
add_vector_to_images(lipstick_vec)
|
print('Young
Vector')
add_vector_to_images(young_vec)
|
print('Male
Vector')
add_vector_to_images(male_vec)
|
print('Eyeglasses
Vector')
add_vector_to_images(eyeglasses_vec)
|
Blonde
vector transforms hair into blonde.
print('Blond
Vector')
add_vector_to_images(blonde_vec)
|
even though we are moving the point a significantly large distance in the latent space, the core image barely changes, except for the one feature that we want to manipulate. This is the Latent space arithmetic, it detects key features, This demonstrates the power of variational autoencoders for capturing and adjusting high-level features in images.
We
can use a similar idea to morph between two faces.
Morphing
Between Faces
Imagine
two points in the latent space, A and B, that represent two images. If you
started at point A and walked toward point B in a straight line, decoding each
point on the line as you went, you would see a gradual transition from the
starting face to the end face.
Mathematically, we are traversing a straight line, which can be
described by the following equation:
z_new = z_A * (1- alpha) + z_B *
alpha
Here, z_A is the latent space point of A, z_B is the latent space
point of B, alpha is a number between 0 and 1 that determines how far along the line
we are, away from point A.
( Figure 6 shows
this process in action.) We take two images, encode them into the latent space,
and then decode points along the straight line between them at regular intervals.
Figure 6: Morphing between two faces
Figure 6: Morphing between two faces
First
we have to build faces morphing function for two images – this function will
get two images with its attributes
def
morph_faces(start_image_file, end_image_file):
factors =
np.arange(0,1,0.1)
att_specific = att[att['image_id'].isin([start_image_file,
end_image_file])]
att_specific =
att_specific.reset_index()
data_flow_label =
imageLoader.build(att_specific, 2)
example_batch = next(data_flow_label)
example_images = example_batch[0]
example_labels = example_batch[1]
z_points
= vae.encoder.predict(example_images)
fig =
plt.figure(figsize=(18, 8))
counter =
1
img =
example_images[0].squeeze()
sub =
fig.add_subplot(1, len(factors)+2, counter)
sub.axis('off')
sub.imshow(img)
counter+=1
for
factor in factors:
changed_z_point = z_points[0] * (1-factor) + z_points[1] * factor
changed_image = vae.decoder.predict(np.array([changed_z_point]))[0]
img =
changed_image.squeeze()
sub =
fig.add_subplot(1, len(factors)+2, counter)
sub.axis('off')
sub.imshow(img)
counter += 1
img =
example_images[1].squeeze()
sub =
fig.add_subplot(1, len(factors)+2, counter)
sub.axis('off')
sub.imshow(img)
plt.show()
|
Lets pass two ids/names
of images to this morphing faces function.
start_image_file
= '000238.jpg'
end_image_file = '000193.jpg' #glasses
morph_faces(start_image_file, end_image_file)
|
First image is the starting image and its
morphin
start_image_file
= '000112.jpg'
end_image_file = '000258.jpg'
morph_faces(start_image_file, end_image_file)
|
start_image_file
= '000230.jpg'
end_image_file = '000712.jpg'
morph_faces(start_image_file, end_image_file)
|
start_image_file
= '002230.jpg'
end_image_file = '004713.jpg'
morph_faces(start_image_file, end_image_file)
|
It is worth noting the smoothness of the transition—even where there are multiple features to change simultaneously (for e.g., removal of glasses, hair color, gender), the VAE manages to achieve this fluidly, showing that the latent space of the VAE is truly a continuous space that can be traversed and explored to generate a multitude of different human faces.
Summary
In
this tutorial, we have seen how variational autoencoders are a powerful tool in
the generative modelling toolbox. Using VAE, high-level features can be
extracted from the individually uninformative pixels. It is introducing
randomness into the model, and constraining the points in the latent space. we
can simply choose points from a standard normal distribution to generate new faces.
Moreover, by performing vector arithmetic within the latent space, we can achieve some amazing effects,
such as face morphing and feature manipulation. With these features, it is easy to see
why VAEs have become a prominent technique for generative modelling in recent
years.
I
hope you understood these topics, lets meet in the next tutorial.
================================================================================
Can you please provide pointer to the Source codeand the dataset?
ReplyDelete