FastAI deeplearning Part 15: Architectures deep dive, UNET and GANS

In the following post, we cover the essentials to create custom learners, and two architectures that are essential for very interesting tasks, such as segmentation and image generation problems. For Generative Adverserial Networks, I will briefly explain what is the concept, support provided by fastai at the moment and how to create your own GAN. Let's get started.

Vison Learner

Fastai makes transfer learning particularly, simple, we just need to specify the architecture we want to use from their meta, and the decide how much to cut to create a new "head" (part that cares about the downstream task). By having defined the new head, we are ready to train our model. What is important is that we may want to use discriminate learning for the different layers and also unfreeze at some point the pretrained layers (body of the network). The following code snipped to the job for the whale detection problem I work on using siamese networks:

model_meta[resnet34]

create_head(512*2, 2, ps=0.5)

encoder = create_body(resnet34(),cut=-2)
head = create_head(512*2, 2, ps=0.5)

model = SiameseModel(encoder, head)

class SiameseModel(Module):
    def __init__(self, encoder, head):
        self.encoder,self.head = encoder,head
    
    def forward(self, x1, x2):
        ftrs = torch.cat([self.encoder(x1), self.encoder(x2)], dim=1)
        return self.head(ftrs)

model = SiameseModel(encoder, head)

Please refer to the following notebook for more details of this implementation.

UNET Learner for segmentation or pixel level classification

Segmentation problems are useful not only for traffic camera problems, but also for many medical

applications where it is required to provide a prediction at pixel level. As we have seen from the convolution deep dive, the size of the output goes down, particularly with stride greater than 1. Some researches tried inverse convolutions, but the problem is that the input is already to small to be able to generate for example a 224x224 image. The solution to this is again to add skip connections and convolutions that augment the output size to get back to the original image size. As the architecture should be adapted to the image size, it becomes handy to use unet learner from fastai, which adapts the architecture for us automatically. That all the code we need to create the data loader and train the unet architecture.

path = untar_data(URLs.CAMVID_TINY)
dls = SegmentationDataLoaders.from_label_func(

    path, bs=8, fnames = get_image_files(path/"images"),
    label_func = lambda o: path/'labels'/f'{o.stem}_P{o.suffix}',
    codes = np.loadtxt(path/'codes.txt', dtype=str)
)

learn = unet_learner(dls, resnet34)
learn.fine_tune(8)

Please take a look at the following paper for a detailed explanation of the network: UNET paper

GANs

Gans allow us to generate new images with surprising realism. It could be use for multiple image tasks, such as correction, resolution improvement, colorization, style transfer... For our work, we used to generate images from sketches with stunning results, and you will probably find use cases not named here.

The concept of GAN is that to generate reallistic images you need a very good generator (of fake images) and discriminator (to detect them). This pull and push learning process ensures we are getting images that are hard to discerned from the actual ones as both improve and learn through the process.

Vanilla GANs can be found in fastai, but for very specific applications you may want to leverage the callbacks functionality from fastai, and implement your own. Here you can find two examples of people who use fastai for non standard implementations, for image restoration and image colorization

Before going into any link, please take a look at the original paper, as if you like to detailed review of simple approach to translate sketch to image in those three posts of mine.

Concluding remarks

We briefly cover how to adjust a pretrained image model to your desired downstream tasks, even if that goes beyond simple classification or regression per image, you can also leverage fastai for image segmentation, and with more work on your end (using callbacks) you can implement any gan or state of the art image model by yourself.

Note that I did not talk about the NLP and Tabular case, and the reason for that is that I cover pretty generic and real problem implementations in the following posts:

NLP: NLP finetuning, text creation
Tabular: Categorical embeddings , Pump failure detection

In the following post I briefly cover Optimizers and Callbacks, and believe it or not, that will be the last section of the FastAI course, what A journey!

Alan Fortuny Sicart

Search This Blog