Convolutional Neural Networks (Part 4)

Recognition Problems

There are plenty of applications where we need to recognize a face, specie or car plate in a single image. This is challenging, as we need to get it right, while seeing only once that object specifically. Training that objects with one sample for each would not work neither. What we can do instead is to try to stablish the similarity between two images to stablish the likelihood of the same object being in both. In that way we can stablish if the object is in the training data based or not, given the similarity score and a threshold.

Siamese Networks anf the Triple loss

In order to calculate that similarity function, siamese networks tend to work fairly well. Those networks encode the image into a lower dimensional vector, from which we can compute a distance metric (cosine, euclidean...). The learning of that networks requires a specific loss function, called triplet loss. The triplet loss we have an anchor (the image to evaluate) and a positive and negative sample. The positive sample correspond to the same object (product, face, specie...) and the negative to another one.

We want our encoded representations to provide numbers that are close between the anchor and the positive sample and far between the anchor and the negative sample.

Formally, you go over groups of three images (anchor, positive and negative) over your training set:

It is important to make sure that the negative samples are not too obviously different, as therefore the model will soon stop learning, as it was able to differentiate positive and negative samples pretty fast.

Given the amount of images required to get state of the art, one may want to use transfer learning of a pretrained model, for example : https://pypi.org/project/facenet-pytorch/

It is also possible to train the siamese network as a binary classification problem ( anchor vs positive = 1 and anchor vs negative = 0). That simplifies the data preparation for training, it is not clear in the Deep learning specialization which one to use, as apparently both work well.

Neural Style Transfer

For those that like me like art as much as engineering, this section is for you. Style transfer is about to transform images taking the style of another. We will now deep dive how a deep neural net learns how to do that.

Visualizing the neural net maximum activation we can get an intuition of how neural nets learn the features of an image. The following paper shows how earlier layers get shapes and contours which are quite detailed at an small scale, while deeper layers extract more complex and abstract patterns (LNCS 8689 - Visualizing and Understanding Convolutional Networks (nyu.edu))

How is the cost function of a neural style transfer? basically we want to make sure that the content of the first image remains, while the style is as similar as possible to the second one. Formally:

The content cost will be based on how different a certain layer between the original C and the generated image G are. The more different those are, the higher the cost. The layer should represent the content more than the style of the image. In the same train of reasoning, for the style we also try to minimize the style distance between the generated image G and the "inspiration" image S. Empirically, it is recommended to use multiple layers for the style cost function, as that lead to the best results.

Following the course you can create very interesting images, like this one I did myself on Nuremberg city a la Picasso:

Or my daughter a la dali...

Alan Fortuny Sicart

Search This Blog