Thursday, May 7, 2020

How to Generate Faces Using VAE with Keras?

Variational Autoencoder(VAE) can do many amazing things if we increase the latent space dimensionality from 2D to multi-dimensional space for generating faces.

In the previous tutorial, we have learned about building the VAE and trained with MNIST handwritten digits dataset and also done analysis with testing data. Go through it once.

Welcome to aiRobott, I am Kishor Kumar Vajja. In this tutorial we will learn how to generate celebrity faces using VAE with Keras, and we will 
  • Generate new faces from latent space,  
  • Multi variate standard normal distribution latent space points ,
  • Latent space arithmetic functions and morphing between faces.



we will use the CelebFaces Attributes dataset (CelebA) to train our variational autoencoderIt is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter.
CelebA has large diversities, large quantities, and rich annotations, including
  • 10,177 number of identities,
  • 2,02,599 number of face images, and
  • 5 landmark locations40 binary attributes annotations per image.
The dataset can be employed as the training and test sets for the computer vision tasks, such as 
  • face attribute recognition,
  • face detection,  
  • landmark (or facial part) localization, and 
  • face editing & synthesis. 
it has various labels like (e.g., Glasses, wearing hat, Bangs, wavy hair, smiling, etc.).  (A few example images are shown in the below figure -


For training the VAE, we don’t need the labels, but these will be useful later when we start exploring how these features are captured in the multidimensional latent space. Once our VAE is trained, we can sample from the latent space to generate new examples of celebrity faces.

For training the VAE, some improvements to be done to two modules: they are -
VAE.py and loaders.py. it will be shown in the coding part.

The network architecture for the faces model is similar to the Handwritten digits example, but a few improvements to be done: They are - 

1. here we are using coloured images, so our data now has three input channels (RGB) instead of one (i.e, grayscale). This means we need to change the number of channels in the final convolutional transpose layer of the decoder to 3.

2. Since faces are much more complex than digits, so we have to increase the dimensionality of the latent space so that the network can encode a satisfactory amount of detail from the images. Therefore, we have to use a latent space with 200 dimensions instead of 2.

3.To speed-up the training process, batch normalization layers has to be used after each convolution layer. Even though each batch takes a longer time to run, the number of batches required to reach the same loss is greatly reduced. And dropout layers are also to be used.

4. We increase the reconstruction loss factor to 10,000. This is a parameter that requires tuning; for this dataset and architecture this value was found to generate good results.

5. We use a generator to feed images to the VAE from a folder, rather than loading all the images into memory up front. Since the VAE trains in batches, there is no need to load all the images into memory first, so instead we use the built-in fit_generator method that Keras provides to read in images only when they are required for training.

This is the full architecture of the encoder, which we are going to build-  





Here I have used: Input layer 01, Convolutional layers 04,Batch normalization layers 04, LeakyReLU  04, Dropout 04, flatten 01, mu 01, log_var 01,lambda layer 01.input images shape given is 128x128 of 3 channels, output shape is 200.

This is the full architecture of the decoder:
here Input layer is the output of lambda layer, dense layer 01, reshape layer 01, convolutional transpose layers 04, batch norm 03, leaky relu 03, dropout 03, activation fn.(i.e, sigmoid) - 01

In the previous tutorial, We have created 3 modules, they are -
VAE.pyloaders.py  – we will add some code to these two modules only.
                  and callbacks.py

for VAE.py module we need to add batch normalization and dropout layers:

open the jupyter notebook- now open the VAE.py module, it is previously saved in the models folder. Now we will add required code this module. (Video link)

# importing required packages
import numpy as np
import os
import pickle
import json

# Importing required packages from keras
from keras.layers import Input, Conv2D, Flatten, Dense, Conv2DTranspose, Reshape, Lambda, Activation, LeakyReLU, BatchNormalization, Dropout

from keras.models import Model
from keras import backend as K
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint
from keras.utils import plot_model

# this is callbacks module to be created next
from models.callbacks import CustomCallback, step_decay_schedule


class VariationalAutoencoder():
           
    def __init__(self
                 , input_dim
                 , encoder_conv_filters
                 , encoder_conv_kernel_size
                 , encoder_conv_strides
                 , decoder_conv_t_filters
                 , decoder_conv_t_kernel_size
                 , decoder_conv_t_strides
                 , z_dim
                 , use_batch_norm = False
                 , use_dropout= False
                ):
                       
        self.name = 'variational_autoencoder'

        self.input_dim = input_dim
        self.encoder_conv_filters = encoder_conv_filters
        self.encoder_conv_kernel_size = encoder_conv_kernel_size
        self.encoder_conv_strides = encoder_conv_strides
        self.decoder_conv_t_filters = decoder_conv_t_filters
        self.decoder_conv_t_kernel_size = decoder_conv_t_kernel_size
        self.decoder_conv_t_strides = decoder_conv_t_strides
        self.z_dim = z_dim

        self.use_batch_norm = use_batch_norm
        self.use_dropout = use_dropout

        self.n_layers_encoder = len(encoder_conv_filters)
        self.n_layers_decoder = len(decoder_conv_t_filters)

        self._build()

    def _build(self):               
        ### THE ENCODER
        encoder_input = Input(shape=self.input_dim, name='encoder_input')

        x = encoder_input

        for i in range(self.n_layers_encoder):
           
            conv_layer = Conv2D(
                filters = self.encoder_conv_filters[i]
                , kernel_size = self.encoder_conv_kernel_size[i]
                , strides = self.encoder_conv_strides[i]
                , padding = 'same'
                , name = 'encoder_conv_' + str(i)
                )

            x = conv_layer(x)

            if self.use_batch_norm:
                x = BatchNormalization()(x)
           
            x = LeakyReLU()(x)

            if self.use_dropout:
                x = Dropout(rate = 0.25)(x)

           
        shape_before_flattening = K.int_shape(x)[1:]

        x = Flatten()(x)
        self.mu = Dense(self.z_dim, name='mu')(x)
        self.log_var = Dense(self.z_dim, name='log_var')(x)

        self.encoder_mu_log_var = Model(encoder_input, (self.mu, self.log_var))

        def sampling(args):           
            mu, log_var = args
            epsilon = K.random_normal(shape=K.shape(mu), mean=0., stddev=1.)
            return mu + K.exp(log_var / 2) * epsilon

        encoder_output = Lambda(sampling, name='encoder_output')([self.mu, self.log_var])

        self.encoder = Model(encoder_input, encoder_output)
       
       

        ### THE DECODER
        decoder_input = Input(shape=(self.z_dim,), name='decoder_input')

        x = Dense(np.prod(shape_before_flattening))(decoder_input)
        x = Reshape(shape_before_flattening)(x)

        for i in range(self.n_layers_decoder):           
            conv_t_layer = Conv2DTranspose(
                filters = self.decoder_conv_t_filters[i]
                , kernel_size = self.decoder_conv_t_kernel_size[i]
                , strides = self.decoder_conv_t_strides[i]
                , padding = 'same'
                , name = 'decoder_conv_t_' + str(i)
                )

            x = conv_t_layer(x)
           
            if i < self.n_layers_decoder - 1:
                if self.use_batch_norm:
                    x = BatchNormalization()(x)
                x = LeakyReLU()(x)
                if self.use_dropout:
                    x = Dropout(rate = 0.25)(x)
           else:
              x = Activation('sigmoid')(x)           
           

        decoder_output = x

        self.decoder = Model(decoder_input, decoder_output)

        ### THE FULL VAE
        model_input = encoder_input
        model_output = self.decoder(encoder_output)

        self.model = Model(model_input, model_output)


    def compile(self, learning_rate, r_loss_factor):       
        self.learning_rate = learning_rate

        ### COMPILATION
        def vae_r_loss(y_true, y_pred):
            r_loss = K.mean(K.square(y_true - y_pred), axis = [1,2,3])
            return r_loss_factor * r_loss

        def vae_kl_loss(y_true, y_pred):
            kl_loss =  -0.5 * K.sum(1 + self.log_var - K.square(self.mu) - K.exp(self.log_var), axis = 1)
            return kl_loss

        def vae_loss(y_true, y_pred):
            r_loss = vae_r_loss(y_true, y_pred)
            kl_loss = vae_kl_loss(y_true, y_pred)
            return  r_loss + kl_loss

        optimizer = Adam(lr=learning_rate)
        self.model.compile(optimizer=optimizer, loss = vae_loss,  metrics = [vae_r_loss, vae_kl_loss])


    def save(self, folder):

        if not os.path.exists(folder):
            os.makedirs(folder)
            os.makedirs(os.path.join(folder, 'viz'))
            os.makedirs(os.path.join(folder, 'weights'))
            os.makedirs(os.path.join(folder, 'images'))

        with open(os.path.join(folder, 'params.pkl'), 'wb') as f:
            pickle.dump([               
                self.input_dim
                , self.encoder_conv_filters
                , self.encoder_conv_kernel_size
                , self.encoder_conv_strides
                , self.decoder_conv_t_filters
                , self.decoder_conv_t_kernel_size
                , self.decoder_conv_t_strides
                , self.z_dim
                , self.use_batch_norm
                , self.use_dropout               
                ], f)

        self.plot_model(folder)


    def load_weights(self, filepath):
        self.model.load_weights(filepath)

    def train(self, x_train, batch_size, epochs, run_folder, print_every_n_batches = 100, initial_epoch = 0, lr_decay = 1):

        custom_callback = CustomCallback(run_folder, print_every_n_batches, initial_epoch, self)
        lr_sched = step_decay_schedule(initial_lr=self.learning_rate, decay_factor=lr_decay, step_size=1)
       
        checkpoint_filepath=os.path.join(run_folder, "weights/weights-{epoch:03d}-{loss:.2f}.h5")
        checkpoint1 = ModelCheckpoint(checkpoint_filepath, save_weights_only = True, verbose=1)
        checkpoint2 = ModelCheckpoint(os.path.join(run_folder, 'weights/weights.h5'), save_weights_only = True, verbose=1)

        callbacks_list = [checkpoint1, checkpoint2, custom_callback, lr_sched]

        self.model.fit(    
            x_train
            , x_train
            , batch_size = batch_size
            , shuffle = True
            , epochs = epochs
            , initial_epoch = initial_epoch
            , callbacks = callbacks_list
        )
       
    
    def train_with_generator(self, data_flow, epochs, steps_per_epoch, run_folder, print_every_n_batches = 100, initial_epoch = 0, lr_decay= 1, ):

        custom_callback = CustomCallback(run_folder, print_every_n_batches, initial_epoch, self)
        lr_sched = step_decay_schedule(initial_lr=self.learning_rate, decay_factor=lr_decay, step_size=1)

        checkpoint_filepath=os.path.join(run_folder, "weights/weights-{epoch:03d}-{loss:.2f}.h5")
        checkpoint1 = ModelCheckpoint(checkpoint_filepath, save_weights_only = True, verbose=1)
        checkpoint2 = ModelCheckpoint(os.path.join(run_folder, 'weights/weights.h5'), save_weights_only = True, verbose=1)

        callbacks_list = [checkpoint1, checkpoint2, custom_callback, lr_sched]

        self.model.save_weights(os.path.join(run_folder, 'weights/weights.h5'))
               
        self.model.fit_generator(
            data_flow
            , shuffle = True
            , epochs = epochs
            , initial_epoch = initial_epoch
            , callbacks = callbacks_list
            , steps_per_epoch=steps_per_epoch
            )
     
   
    def plot_model(self, run_folder):       
        plot_model(self.model, to_file=os.path.join(run_folder ,'viz/model.png'), show_shapes = True, show_layer_names = True)
        plot_model(self.encoder, to_file=os.path.join(run_folder ,'viz/encoder.png'), show_shapes = True, show_layer_names = True)
        plot_model(self.decoder, to_file=os.path.join(run_folder ,'viz/decoder.png'), show_shapes = True, show_layer_names = True)
       

There is no need to edit the callbacks.py module. Let’s keep it as it is.
Now we will edit the loaders.py module like this – The required packages to be add at the beginning of this module -   (Video link)

import numpy as np
import pickle
import os
import h5py

# lets import pandas, it is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, bu ilt on top of the Python programming language.
import pandas as pd

# SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering.
import scipy

# glob Uses Unix shell rules to fine filenames matching a pattern.
from glob import glob

#Python method walk() generates the file names in a directory tree by walking the tree either top-down or bottom-up.
# Python method getcwd() returns current working directory of a process.
from os import walk, getcwd

# The module pdb is an interactive source code debugger for Python programs. It supports setting (conditional) breakpoints and single stepping at the source line level, inspection of stack frames, source code listing, and evaluation of arbitrary Python code in the context of any stack frame. It also supports post-mortem debugging and can be called under program control.
import pdb

from keras.datasets import mnist
from keras.preprocessing.image import ImageDataGenerator, load_img, save_img
# or, use this...
from keras_preprocessing.image import ImageDataGenerator, load_img, img_to_array

# If you want the Keras modules you write to be compatible with both Theano (th) and TensorFlow (tf), you have to write them via the abstract Keras backend API. You can import the backend module via:
from keras import backend as K

from keras.utils import to_categorical # Converts a class vector (integers) to binary class matrix.


class ImageLabelLoader():   
    def __init__(self, image_folder, target_size):
        self.image_folder = image_folder
        self.target_size = target_size

    def build(self, att, batch_size, label = None):

        data_gen = ImageDataGenerator(rescale=1./255)
        if label:
            data_flow = data_gen.flow_from_dataframe(
                att
                , self.image_folder
                , x_col='image_id'
                , y_col=label
                , target_size=self.target_size
                , class_mode='other'
                , batch_size=batch_size
                , shuffle=True
            )
        else:
            data_flow = data_gen.flow_from_dataframe(
                att
                , self.image_folder
                , x_col='image_id'
                , target_size=self.target_size
                , class_mode='input'
                , batch_size=batch_size
                , shuffle=True
            )

        return data_flow

class DataLoader():   
   
    def __init__(self, dataset_name, img_res=(256, 256)):       
        self.dataset_name = dataset_name
        self.img_res = img_res

    def load_data(self, domain, batch_size=1, is_testing=False):
        data_type = "train%s" % domain if not is_testing else "test%s" % domain
        path = glob('./data/%s/%s/*' % (self.dataset_name, data_type))

        batch_images = np.random.choice(path, size=batch_size)

        imgs = []
        for img_path in batch_images:
            img = self.imread(img_path)
            if not is_testing:
                img = scipy.misc.imresize(img, self.img_res)

                if np.random.random() > 0.5:
                    img = np.fliplr(img)
            else:
                img = scipy.misc.imresize(img, self.img_res)
            imgs.append(img)

        imgs = np.array(imgs)/127.5 - 1.

        return imgs

    def load_batch(self, batch_size=1, is_testing=False):
        data_type = "train" if not is_testing else "val"
        path_A = glob('./data/%s/%sA/*' % (self.dataset_name, data_type))
        path_B = glob('./data/%s/%sB/*' % (self.dataset_name, data_type))

        self.n_batches = int(min(len(path_A), len(path_B)) / batch_size)
        total_samples = self.n_batches * batch_size

        # Sample n_batches * batch_size from each path list so that model sees all
        # samples from both domains
        path_A = np.random.choice(path_A, total_samples, replace=False)
        path_B = np.random.choice(path_B, total_samples, replace=False)

        for i in range(self.n_batches-1):
            batch_A = path_A[i*batch_size:(i+1)*batch_size]
            batch_B = path_B[i*batch_size:(i+1)*batch_size]
            imgs_A, imgs_B = [], []
            for img_A, img_B in zip(batch_A, batch_B):
                img_A = self.imread(img_A)
                img_B = self.imread(img_B)

                img_A = scipy.misc.imresize(img_A, self.img_res)
                img_B = scipy.misc.imresize(img_B, self.img_res)

                if not is_testing and np.random.random() > 0.5:
                        img_A = np.fliplr(img_A)
                        img_B = np.fliplr(img_B)

                imgs_A.append(img_A)
                imgs_B.append(img_B)

            imgs_A = np.array(imgs_A)/127.5 - 1.
            imgs_B = np.array(imgs_B)/127.5 - 1.

            yield imgs_A, imgs_B

    def load_img(self, path):
        img = self.imread(path)
        img = scipy.misc.imresize(img, self.img_res)
        img = img/127.5 - 1.
        return img[np.newaxis, :, :, :]

    def imread(self, path):
        return scipy.misc.imread(path, mode='RGB').astype(np.float)

   
   
def load_model(model_class, folder):
   
    with open(os.path.join(folder, 'params.pkl'), 'rb') as f:       
        params = pickle.load(f)

    model = model_class(*params)

    model.load_weights(os.path.join(folder, 'weights/weights.h5'))

    return model

# this is for loading mnist handwritten dataset, but it is not required here
def load_mnist():   
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_train = x_train.astype('float32') / 255.
    x_train = x_train.reshape(x_train.shape + (1,))
    x_test = x_test.astype('float32') / 255.
    x_test = x_test.reshape(x_test.shape + (1,))

    return (x_train, y_train), (x_test, y_test)

# loading images from data folder as training images
def load_celeb(data_name, image_size, batch_size):
    data_folder = os.path.join("./data", data_name)

    data_gen = ImageDataGenerator(preprocessing_function=lambda x: (x.astype('float32') - 127.5) / 127.5)

    x_train = data_gen.flow_from_directory(data_folder
                                            , target_size = (image_size,image_size)
                                            , batch_size = batch_size
                                            , shuffle = True
                                            , class_mode = 'input'
                                            , subset = "training"
                                                )

    return x_train


Now we will start training our VAE:  let’s open jupyter notebook, and i have named as VAE_FACES-TRAINING.ipynb , Let us first build VAE model and train with celebA dataset:

VAE Training - Faces dataset 

imports 

# importing the required packages and our VAE.py module
import os
from glob import glob
import numpy as np

from models.VAE import VariationalAutoencoder

from keras.preprocessing.image import ImageDataGenerator
 (or)
#from keras_preprocessing.image import ImageDataGenerator
# at this place any one can gives same result, so you can use any one.

# run params, this produces folders – run/vae/001_faces/images, viz, weights, like this
section = 'vae'
run_id = '001'
data_name = 'faces'
RUN_FOLDER = 'run/{}/'.format(section)
RUN_FOLDER += '_'.join([run_id, data_name])

if not os.path.exists(RUN_FOLDER):
    os.makedirs(RUN_FOLDER)
    os.mkdir(os.path.join(RUN_FOLDER, 'viz'))
    os.mkdir(os.path.join(RUN_FOLDER, 'images'))
    os.mkdir(os.path.join(RUN_FOLDER, 'weights'))

mode =  'build'

# download celebA dataset and list_attr_celeba.csv.zip file from this  #link  https://www.kaggle.com/jessicali9530/celeba-dataset to the data folder, it takes time to download, pls have patience. #After downloading unzip the file in the data folder.
DATA_FOLDER = './data/celeb/'

Data

INPUT_DIM = (128,128,3)
BATCH_SIZE = 32 # images for a batch
# adding images path
filenames = np.array(glob(os.path.join(DATA_FOLDER, '*/*.jpg')))

NUM_IMAGES = len(filenames)

print("Total Images: ", NUM_IMAGES)
print("Shape of Images: ", INPUT_DIM[:2])
print("Data Folder Path: ", DATA_FOLDER)

Data Flow

# scaling the images and save into data_flow for training the network
data_gen = ImageDataGenerator(rescale=1./255)

data_flow = data_gen.flow_from_directory(DATA_FOLDER
                                         , target_size = INPUT_DIM[:2]
                                         , batch_size = BATCH_SIZE
                                         , shuffle = True
                                         , class_mode = 'input'
                                         , subset = "training"
                                            )

Architecture

# building the model and save in the run folder
vae = VariationalAutoencoder(
                input_dim = INPUT_DIM
                , encoder_conv_filters=[32,64,64, 64]
                , encoder_conv_kernel_size=[3,3,3,3]
                , encoder_conv_strides=[2,2,2,2]
                , decoder_conv_t_filters=[64,64,32,3]
                , decoder_conv_t_kernel_size=[3,3,3,3]
                , decoder_conv_t_strides=[2,2,2,2]
                , z_dim=200
                , use_batch_norm=True
                , use_dropout=True)

Vae.save(RUN_FOLDER)

if mode == 'build':
    vae.save(RUN_FOLDER)
else:
    vae.load_weights(os.path.join(RUN_FOLDER, 'weights/weights.h5'))

the models - encoder, decoder and model are saved in viz folder-

vae.encoder.summary() #and you can also see the summary of encoder here also using this code.
vae.decoder.summary() # in the similar way, decoder also

Compilation & Training

# parameters
LEARNING_RATE = 0.0005
R_LOSS_FACTOR = 10000 # This parameter required for tuning the network.
EPOCHS = 5  # 200 # here I am giving 1, but best results take 200 epochs
PRINT_EVERY_N_BATCHES = 100 # new image will produce from 100 batches of #input images
INITIAL_EPOCH = 0

# now compile the model

vae.compile(LEARNING_RATE, R_LOSS_FACTOR)

# Let’s start the training our VAE

vae.train_with_generator(    
    data_flow
    , epochs = EPOCHS
    , steps_per_epoch = NUM_IMAGES / BATCH_SIZE
    , run_folder = RUN_FOLDER
    , print_every_n_batches = PRINT_EVERY_N_BATCHES
    , initial_epoch = INITIAL_EPOCH
)

Now first epoch is running with 6331 batches of images, in the images folder it’s output images are saving for every 100 batches of images it produces one new image. The first image img_001_0.jpg saved at first, after 100 batches new image img_001_100.jpg saved, similarly for every 100 batches one new image produced, this continues upto the end of 2nd epoch.

Observe the output: I have run only 2 epochs, but it is good for run upto 200 epochs for best training our VAE. Here our dataset has 2,02,599 images turned into batches, for every batch 32 images passed, so there is 6331 batches. For every 100 batches of images, our VAE builds one new image, so total 64 new images build for 1 epoch, for 2 epochs total 128 new images. All these images saved in the folder of images.

Now our VAE is trained, Let us now analyse the VAE:


Let’s open notebook, i have named as ‘VAE_FACES-ANALYSIS’
VAE Analysis – Faces dataset

# importing the required packages, loading the celeb dataset with labels and model.

Imports

import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import os
from scipy.stats import norm
import pandas as pd

from models.VAE import VariationalAutoencoder
from models.loaders import load_model, ImageLabelLoader

# data folder and image folder paths
# run params,
section = 'vae'
run_id = '001'
data_name = 'faces'
RUN_FOLDER = 'run/{}/'.format(section)
RUN_FOLDER += '_'.join([run_id, data_name])

DATA_FOLDER = './data/celeb/'
IMAGE_FOLDER = './data/celeb/img_align_celeba/'

Data
INPUT_DIM = (128, 128, 3)
# reading the list of celebrity attributes from data folder.
att = pd.read_csv(os.path.join(DATA_FOLDER, 'list_attr_celeba.csv'))
# loading images with labels
imageLoader = ImageLabelLoader(IMAGE_FOLDER, INPUT_DIM[:2])

 

# lets see all the attributes of images, here 40 attributes for all the images

print(att)

# if you want to see top 5 attributes,  use Pandas head() method.

att.head()

 Architecture

# now we will build the architecture from run folder, by using parameters and weights

vae = load_model(VariationalAutoencoder, RUN_FOLDER)

Reconstructing Faces

Now we will reconstruct the original faces from VAE, Let’s pass 10 sample images with labels to VAE and we will get reconstructed images.


n_to_show = 10

data_flow_generic = imageLoader.build(att, n_to_show)

example_batch = next(data_flow_generic)
example_images = example_batch[0]

z_points = vae.encoder.predict(example_images)

reconst_images = vae.decoder.predict(z_points)

fig = plt.figure(figsize=(15, 3))
fig.subplots_adjust(hspace=0.4, wspace=0.4)

for i in range(n_to_show):
    img = example_images[i].squeeze()
    sub = fig.add_subplot(2, n_to_show, i+1)
    sub.axis('off')       
    sub.imshow(img)

for i in range(n_to_show):
    img = reconst_images[i].squeeze()
    sub = fig.add_subplot(2, n_to_show, i+n_to_show+1)
    sub.axis('off')
    sub.imshow(img)

First it will validate all the images in the data folder, from that randomly selects 10 images for passing through VAE, it’s output is our required images.

The top row shows the original images and the bottom row shows the reconstructions once they have passed through the encoder and decoder.


                     Figure : Reconstructed faces, after passing through the encoder and decoder

We can see that the VAE has successfully captured the key features of each face—the angle of the head, the hairstyle, the expression, etc. Some of the fine detail is missing, but it is important to remember that the aim of building variational autoencoders isn’t to achieve perfect reconstruction loss. Our end goal is to sample from the latent space in order to generate new faces. So this is our required output.

Latent Space distribution


Here Latent space is a multivariate standard normal distribution, because 200 dimensions used in the latent space. Actually we cannot view all dimensions simultaneously, instead we can check the distribution of each latent dimension individually.

Here is the code for Latent space distribution-
z_test = vae.encoder.predict_generator(data_flow_generic, steps = 20, verbose = 1)

x = np.linspace(-3, 3, 100)

fig = plt.figure(figsize=(20, 20))
fig.subplots_adjust(hspace=0.6, wspace=0.4)

for i in range(50):
    ax = fig.add_subplot(5, 10, i+1)
    ax.hist(z_test[:,i], density=True, bins = 20)
    ax.axis('off')
    ax.text(0.5, -0.35, str(i), fontsize=10, ha='center', transform=ax.transAxes)
    ax.plot(x,norm.pdf(x))

plt.show()


The first 50 dimensions in our latent space are shown in this output (in the below Figure 3). There aren’t any distributions that stand out as being significantly different from the standard normal, If we see  that any dimensions that are significantly different from a standard normal distribution, we should reduce the reconstruction loss factor, since the KL divergence term isn’t having enough effect in this case.  so we can move on to generating some new faces.


               Figure 3 : Distributions of points for the first 50 dimensions in the latent space

Generating New Faces from the latent space.

n_to_show = 30

znew = np.random.normal(size = (n_to_show, vae.z_dim))  # 1

reconst = vae.decoder.predict(np.array(znew)) # 2

fig = plt.figure(figsize=(18, 5))
fig.subplots_adjust(hspace=0.4, wspace=0.4)
for i in range(n_to_show):
    ax = fig.add_subplot(3, 10, i+1)
    ax.imshow(reconst[i, :,:,:])  # 3
    ax.axis('off')

plt.show()
1.    We sample 30 points from a standard normal distribution with 200 dimensions…
2.     …then pass these points to the decoder.
3.    The resulting output is a 128 × 128 × 3 image that we can view here under- This is the output (is shown in Figure 4)



                                       Figure 4: New generated faces

Amazingly, the VAE is able to take the set of points that we sampled and convert each into a convincing image of a person’s face. While the images are not perfect, they are a giant leap forward from the Naive Bayes model

The Naive Bayes model faced the problem of not being able to capture dependency between adjacent pixels, since it had no notion of higher-level features such as sunglasses or brown hair.

The VAE doesn’t suffer from this problem, since the convolutional layers of the encoder are designed to translate low-level pixels into high-level features and the decoder is trained to perform the opposite task of translating the high-level features in the latent space back to raw pixels.

Latent Space Arithmetic

Lets see some theory- 
One benefit of mapping images into a lower-dimensional space is that we can perform arithmetic on vectors in this latent space that has a visual analogue when decoded back into the original image domain.

For example, suppose we want to take an image of somebody who looks sad and give them a smile. To do this we first need to find a vector in the latent space that points in the direction of increasing smile, i.e, Smile increasing vector. Adding this vector to the encoding of the original image in the latent space will give us a new point which, when it was decoded, it should give us a more smiley version of the original image. So how can we find the smile vector? As we know that- each image in the CelebA dataset is labelled with attributes, there is one attribute of which is smiling. If we take the average position of encoded images in the latent space with the attribute smiling and subtract the average position of encoded images that do not have the attribute smiling, and adding the average position of encoded images that is having attribute smiling, we will obtain the vector that points from not smiling to smiling, which is exactly what we need.

Conceptually, we are performing vector arithmetic in the latent space
                             z_new = z + alpha * feature_vector  
where alpha is a factor that determines how much of the feature vector is added or subtracted:


Let’s see this in action. (Figure 5 shows) all the images that have been encoded into the latent space. We then add or subtract multiples of a certain vector (e.g., smile, blonde, male, eyeglasses) to obtain different versions of the image, with only the relevant feature changed.
                                           Figure 5: Adding and subtracting features to and from faces

It is quite remarkable that - even though we are moving the point a significantly large distance in the latent space, the core image barely changes, except for the one feature that we want to manipulate. This demonstrates the power of variational autoencoders for capturing and adjusting high-level features in images.
Let us now build functions for feature vectors -

Functions for getting vectors for corresponding label

Now let us build a function for feature vectors from labels-

def get_vector_from_label(label, batch_size):   

    data_flow_label = imageLoader.build(att, batch_size, label = label)   
     
     
    # initiating vectors
    origin = np.zeros(shape = vae.z_dim, dtype = 'float32')
    current_sum_POS = np.zeros(shape = vae.z_dim, dtype = 'float32')
    current_n_POS = 0
    current_mean_POS = np.zeros(shape = vae.z_dim, dtype = 'float32')

    current_sum_NEG = np.zeros(shape = vae.z_dim, dtype = 'float32')
    current_n_NEG = 0
    current_mean_NEG = np.zeros(shape = vae.z_dim, dtype = 'float32')

    current_vector = np.zeros(shape = vae.z_dim, dtype = 'float32')
    current_dist = 0

    print('label: ' + label)
    print('images : POS move : NEG move :distance : 𝛥 distance')
    while(current_n_POS < 10000):       
       
        batch = next(data_flow_label)
        im = batch[0]
        attribute = batch[1]

        z = vae.encoder.predict(np.array(im))

        z_POS = z[attribute==1]
        z_NEG = z[attribute==-1]

        # this is for adding attributes
        if len(z_POS) > 0:           
            current_sum_POS = current_sum_POS + np.sum(z_POS, axis = 0)
            current_n_POS += len(z_POS)
            new_mean_POS = current_sum_POS / current_n_POS
            movement_POS = np.linalg.norm(new_mean_POS-current_mean_POS)

        # for subtracting attributes
        if len(z_NEG) > 0:
            current_sum_NEG = current_sum_NEG + np.sum(z_NEG, axis = 0)
            current_n_NEG += len(z_NEG)
            new_mean_NEG = current_sum_NEG / current_n_NEG
            movement_NEG = np.linalg.norm(new_mean_NEG-current_mean_NEG)

        current_vector = new_mean_POS-new_mean_NEG
        new_dist = np.linalg.norm(current_vector)
        dist_change = new_dist - current_dist       
       

        print(str(current_n_POS)
              + '    : ' + str(np.round(movement_POS,3))
              + '    : ' + str(np.round(movement_NEG,3))
              + '    : ' + str(np.round(new_dist,3))
              + '    : ' + str(np.round(dist_change,3))
             )

        current_mean_POS = np.copy(new_mean_POS)
        current_mean_NEG = np.copy(new_mean_NEG)
        current_dist = np.copy(new_dist)

        if np.sum([movement_POS, movement_NEG]) < 0.08:           
            current_vector = current_vector / current_dist
            print('Found the ' + label + ' vector')
            break

    return current_vector  

Now we will build a function for adding the vectors to images
def add_vector_to_images(feature_vec):   

    n_to_show = 5
    factors = [-4,-3,-2,-1,0,1,2,3,4]

    example_batch = next(data_flow_generic)
    example_images = example_batch[0]
    example_labels = example_batch[1]

    z_points = vae.encoder.predict(example_images)

    fig = plt.figure(figsize=(18, 10))

    counter = 1

    for i in range(n_to_show):

        img = example_images[i].squeeze()
        sub = fig.add_subplot(n_to_show, len(factors) + 1, counter)
        sub.axis('off')        
        sub.imshow(img)

        counter += 1

        for factor in factors:

            changed_z_point = z_points[i] + feature_vec * factor
            changed_image = vae.decoder.predict(np.array([changed_z_point]))[0]

            img = changed_image.squeeze()
            sub = fig.add_subplot(n_to_show, len(factors) + 1, counter)
            sub.axis('off')
            sub.imshow(img)

            counter += 1

    plt.show()

Now we will generate the feature vectors from the images-
# Respective attribute vectors
BATCH_SIZE = 500
# Here we are generating 8 feature vectors from labels
attractive_vec = get_vector_from_label('Attractive', BATCH_SIZE)
mouth_open_vec = get_vector_from_label('Mouth_Slightly_Open', BATCH_SIZE)
smiling_vec = get_vector_from_label('Smiling', BATCH_SIZE)
lipstick_vec = get_vector_from_label('Wearing_Lipstick', BATCH_SIZE)
young_vec = get_vector_from_label('High_Cheekbones', BATCH_SIZE)
male_vec = get_vector_from_label('Male', BATCH_SIZE)
eyeglasses_vec = get_vector_from_label('Eyeglasses', BATCH_SIZE)
blonde_vec = get_vector_from_label('Blond_Hair', BATCH_SIZE)

This will take some time, wait for until the completion.
Now all the feature vectors have generated, Lets start- we will take 5 images for applying the feature vectors.
Now we will add attractive feature vector to labelled images -

print('Attractive Vector')
add_vector_to_images(attractive_vec)



Here each first image is transformed to attractively.

Similarly mouth open vector finds the mouth in the image and transforms into open mouth

print('Mouth Open Vector')
add_vector_to_images(mouth_open_vec)


Similarly smiling vector adds smile to the face
print('Smiling Vector')
add_vector_to_images(smiling_vec)


Lipstick vector adds lipstick to the faces, not perfectly done but adding the features from the latent space.
print('Lipstick Vector')
add_vector_to_images(lipstick_vec)


Young vector transforms the faces into young
print('Young Vector')
add_vector_to_images(young_vec)


Male vector transforms lady faces into male faces.
print('Male Vector')
add_vector_to_images(male_vec)

eyeglasses  vector adding eyeglasses to faces
print('Eyeglasses Vector')
add_vector_to_images(eyeglasses_vec)


Blonde vector transforms hair into blonde.
print('Blond Vector')
add_vector_to_images(blonde_vec)

even though we are moving the point a significantly large distance in the latent space, the core image barely changes, except for the one feature that we want to manipulate. This is the Latent space arithmetic, it detects key features, This demonstrates the power of variational autoencoders for capturing and adjusting high-level features in images.

We can use a similar idea to morph between two faces.
Morphing Between Faces
Imagine two points in the latent space, A and B, that represent two images. If you started at point A and walked toward point B in a straight line, decoding each point on the line as you went, you would see a gradual transition from the starting face to the end face.
Mathematically, we are traversing a straight line, which can be described by the following equation:
                z_new = z_A * (1- alpha) + z_B * alpha
Here, z_A is the latent space point of A, z_B is the latent space point of B,  alpha is a number between 0 and 1 that determines how far along the line we are, away from point A.

( Figure 6 shows this process in action.) We take two images, encode them into the latent space, and then decode points along the straight line between them at regular intervals.
                                                        Figure 6: Morphing between two faces

First we have to build faces morphing function for two images – this function will get two images with its attributes

def morph_faces(start_image_file, end_image_file):

    factors = np.arange(0,1,0.1)

    att_specific = att[att['image_id'].isin([start_image_file, end_image_file])]
     att_specific = att_specific.reset_index()
     data_flow_label = imageLoader.build(att_specific, 2)

    example_batch = next(data_flow_label)
    example_images = example_batch[0]
    example_labels = example_batch[1]

    z_points = vae.encoder.predict(example_images)


    fig = plt.figure(figsize=(18, 8))

    counter = 1

    img = example_images[0].squeeze()
    sub = fig.add_subplot(1, len(factors)+2, counter)
    sub.axis('off')       
    sub.imshow(img)

    counter+=1


    for factor in factors:

        changed_z_point = z_points[0] * (1-factor) + z_points[1]  * factor
        changed_image = vae.decoder.predict(np.array([changed_z_point]))[0]

        img = changed_image.squeeze()
        sub = fig.add_subplot(1, len(factors)+2, counter)
        sub.axis('off')
        sub.imshow(img)

        counter += 1

    img = example_images[1].squeeze()
    sub = fig.add_subplot(1, len(factors)+2, counter)
    sub.axis('off')       
    sub.imshow(img)


    plt.show()

Lets pass two ids/names of images to this morphing faces function.
start_image_file = '000238.jpg'
end_image_file = '000193.jpg' #glasses

morph_faces(start_image_file, end_image_file)
First image is the starting image and its morphin


Similarly, these two images.
start_image_file = '000112.jpg'
end_image_file = '000258.jpg'

morph_faces(start_image_file, end_image_file)
And 


few more examples
start_image_file = '000230.jpg'
end_image_file = '000712.jpg'

morph_faces(start_image_file, end_image_file)

start_image_file = '002230.jpg'
end_image_file = '004713.jpg'

morph_faces(start_image_file, end_image_file)

It is worth noting the smoothness of the transition—even where there are multiple features to change simultaneously (for e.g., removal of glasses, hair color, gender), the VAE manages to achieve this fluidly, showing that the latent space of the VAE is truly a continuous space that can be traversed and explored to generate a multitude of different human faces.

Summary
In this tutorial, we have seen how variational autoencoders are a powerful tool in the generative modelling toolbox. Using VAE, high-level features can be extracted from the individually uninformative pixels. It is introducing randomness into the model, and constraining the points in the latent space. we can simply choose points from a standard normal distribution to generate new faces. Moreover, by performing vector arithmetic within the latent space, we can achieve some amazing effects, such as face morphing and feature manipulation. With these features, it is easy to see why VAEs have become a prominent technique for generative modelling in recent years.

I hope you understood these topics, lets meet in the next tutorial.

================================================================================

1 comment:

  1. Can you please provide pointer to the Source codeand the dataset?

    ReplyDelete