FastAI Deep Learning Journey Part 12: MidLevel API applied to Siamese Networks

FastAI Deep Learning Journey Part 12: MidLevel API applied to Siamese Networks - Whale tracking application

Data Scientists, particularly strong coders, do not like to leverage high level apis without being able to know how to customize the framework for their specific applications. When one see that fastai is able to perform so many steps including data loaders, shuffling, augmentations, transfer learning... in a few lines of code, one is both amazed and terrified. How do I know what is going on under the hood?

If you are the ones like me willing to understand what's going on without going line by line of the source code, you are fine with the documentation and tutorials. If you are willing to be able to debug every single line of code, or need to customize fastai for your problem, do not worry, the mid and low api is for you.

In this post, I will show you how to create your own transforms, pipeline, tensors, data loaders and learners for both text and computer vision problems. To make things more spicy and real, I show how fastai can be use to build siamese networks with much less effort and great performance that the typical triplet loss and complicated batch frameworks one can find out there. Let's get started.

Going a little deeper: transforms, pipelines, TfmdLists and Datasets

In the previous post, FastAI_NLP, we showed how to create a datablock and data loader on the movies data set, which we applied also to the books dataset as follows:

imdb_clas = DataBlock(
    blocks=(TextBlock.from_df('book_desc', seq_len=72,vocab=dls_lm.vocab), CategoryBlock),
    get_x=ColReader('text'), get_y=ColReader('book_rating'))

dls_clas = imdb_clas.dataloaders(df, bs=64)
dls_clas.show_batch(max_n=2)

We explain how behind the scenes Textblock uses tokenizer, numericalization and also proper fix batch size... all under the hood, not explict here. In the following, we will replicate, using the mid level api, the different steps to get to the same situation explicitely defining and check our transformations, on both the inpendent and target variable. Here I summarize the key ideas, while the code details are in this notebook: mid api notebook

Tranforms

Transforms contain the transformations applied to our raw data, both variables and target variable before we proceed with the training (the could be before or after the batch, even at test time).

We can create any transform that we would like, such as a normalization transform, as long as it contains an encodes, decodes and setup method. Here is how it would look like for our example:

class NormalizeMean(Transform):
    def setups(self, items): self.mean = sum(items)/len(items); self.std = np.std(items)
    def encodes(self, x): return (x-self.mean)/self.std
    def decodes(self, x): return (x*self.std) + self.mean

tfm = NormalizeMean()
tfm.setup([1,2,3,4,5])
start = 2
y = tfm(start)
z = tfm.decode(y)
tfm.mean,y,z

(3.0, -0.7071067811865475, 2.0)

This is very convenient class, as it allows to get back the original value, which is normally interpretable via visual analysis, so you can get back the original pixels or text.

We will normally need several transforms, such as tokenization and numericalization for NLP, and hence we need the Pipeline class.

Pipeline

The pipeline allows us to compose transforms on a sequential order:

tok = Tokenizer.from_folder(path)

tok.setup(txts)

num = Numericalize()
num.setup(toks)

tfms = Pipeline([tok, num])
t = tfms(txts[0]); t[:20]

We can call decode on the result of the pipeline to see the input before the last transform (note that this is reversible for numericalize but not for tokenization as we speak):

tfms.decode(t)[:100]

xxbos i got this as a turkey movie and was i not disappointed . \n\n xxmaj acting - overall even thoug

What we are missing in the pipeline is the setup, from which we will use TfmdLists

TfmdLists

With tfmdLists we can add the input file or raw data and also the split for training and validation set.

cut = int(len(files)*0.8)
splits = [list(range(cut)), list(range(cut,len(files)))]
tls = TfmdLists(files, [Tokenizer.from_folder(path), Numericalize], 
                splits=splits)

tls.valid[0][:20]

tls.train[0][:20]

Note that we need to do the same for the target variable, in our case:

lbls = files.map(parent_label)

cat = Categorize()

cat.setup(lbls)

tls_y = TfmdLists(files, [parent_label, Categorize()])

In order to get both transformation pipeline into one single object we call datasets:

x_tfms = [Tokenizer.from_folder(path), Numericalize]

y_tfms = [parent_label, Categorize()]

dsets = Datasets(files, [x_tfms, y_tfms], splits=splits)

Which can be decoded too, and most importantly, converted into a dataloader.

dls = dsets.dataloaders(dl_type=SortedDL,bs=64, before_batch=pad_input)

Here we need to specify that the data loaded is sorted (text order matter) and to add padding to ensure fixed length size in the batches for training.

With that we get to where we start with the datablock:

path = untar_data(URLs.IMDB)

dls = DataBlock(

    blocks=(TextBlock.from_folder(path),CategoryBlock),

    get_y = parent_label,

    get_items=partial(get_text_files, folders=['train', 'test']),

    splitter=GrandparentSplitter(valid_name='test')

).dataloaders(path)

Personally I think the following:

I believe most problems in industry and many in research can be applied using the datablock, for both regression and classification problems, not to mention the language model set up
I struggle to make that implementation taking as input a dataframe, I recommend to use similar folder structure is_valid/target/text_id.txt as is quite generic and allow you to use both high level and mid level apis effortesly.

Let's go know to the application on a rather custom problem, which is the one where we have to identify many classes with one or few observations per class. We will implement almost from scratch a siamese network and classify correctly 80% of 800 whales with no more than 8 pics per whale.

Using mid level API to track threatened whales, a siamese network application

I try always to use what I am learning on a real problem, particularly on topics related to biodiversity protection and mitigating humand induced climate change. I pick this interesting use case from the wonderful competition page: whale tracking challenge. The idea is to build a systems that allows to track and identify beluga whales given the fact that we have around 800 whales detected but only around 7 pics of each. This is a good chance to use a siamese network, as this is method applied for face recognition when we have many classes or faces but one or very few sample of each class. We will create first a transformer and then train the model.

I got good news for you, we do not need triplet loss or complicated sampling streams to get very accurate results (without any fine tune I got more than 80% accuracy in a few minutes of training). I reccomend to take a look at the following siamese networks posts, to appreciate even more how easy our live become with the midlevel api from fastai, and to get familiar with the concept, if my intro is not enough.

Problem statement

Our goal here is to be able to state which whale correspond to each top view image. As we have very little samples, we are not likely to succeed if we train a classifier with 7 samples on 800 classes. Instead we can frame a problem differently, asking the neural network if, given an image of a whale, another image samples belongs to the same whale or another. This can be run accurately and fast over all options to actually find to which whale the picture likely belongs too. This is heavily used for face recognition, but the one shot image recognition is great when we have few samples and many classes.

First Step: create a siamese image object

In order to train the model, we will create an image where our anchor (selected image) is concatenated with abother one (of the same or another whale). We will create a black line in between and a label being True when both pics correspond to the same whale and False otherwise.

class SiameseImage(fastuple):
    def show(self, ctx=None, **kwargs): 
        img1,img2,same_breed = self #inputs are the images and the label
        if not isinstance(img1, Tensor): # we check that they have the same size and make them tensors
            if img2.size != img1.size: img2 = img2.resize(img1.size)
            t1,t2 = tensor(img1),tensor(img2)
            t1,t2 = t1.permute(2,0,1),t2.permute(2,0,1) # reshape the data to read it as a row vector or tensor
        else: t1,t2 = img1,img2
        line = t1.new_zeros(t1.shape[0], t1.shape[1], 10) # this creates the black line
        return show_image(torch.cat([t1,line,t2], dim=2),  # we return the image tuple with the line and the title = label
                          title=same_breed, ctx=ctx)

The curious reader will spot we use the fastuple subclass, and this is to allow us to apply any transform such as normalization, resizing or augmentations on each image of the tuple.

In the next, we will apply the transforms that allow us to shuflle properly the images before we create our data loader.

Second step: Create the Custom Siamese transform

For each image we want our transform to pick with 50% probably another sample with the same label or different. Therefore we will have a balanced distribution of falses and positives cases, independent on the class distribution in the data. We will do that for every batch in the training to have more variety of samples but only once for the validation data set.

class SiameseTransform(Transform):
    def __init__(self, files, label_func, splits):
        self.labels = files.map(label_func).unique() 
        # dictionary of possible labels
        self.lbl2files = {l: L(f for f in files if label_func(f) == l) 
                          for l in self.labels} # get the label for each image
        self.label_func = label_func 
        self.valid = {f: self._draw(f) for f in files[splits[1]]}
        # draw one for the validation set
        
    def encodes(self, f):
        f2,t = self.valid.get(f, self._draw(f)) 
        img1,img2 = PILImage.create(f),PILImage.create(f2)
        return SiameseImage(img1, img2, t)
    
    def _draw(self, f):
        same = random.random() < 0.5 # pick same with 50% chance
        cls = self.label_func(f)
        if not same: # when not same pick one with different label
            cls = random.choice(L(l for l in self.labels if l != cls)) 
        return random.choice(self.lbl2files[cls]),same 
        # get file selected and label

With that we can create our transform, adding the random splitter for training and validation

splits = RandomSplitter()(files)
tfm = SiameseTransform(files, label_func, splits)
tfm(files[0]).show();

As the label and the images are in the same tupple, we can use TfmdLists instead of DataSets:

tls = TfmdLists(files, tfm, splits=splits)

Finally we can get our dataloader, adding some required transformations on each item, such as resize and to tensor, and to each batch, such as into float tensors and normalization, normally default in fastai high level api on the ImageDataBlock.

Training the siamese network for whale detection

Here I will be leveraging lesson 15 from the fastaibook, as it makes sense to add the modeling part.

We will use here also the midlevel api to create our learner. As we will be doing transfer learning on resnet 34, we need to cut and add the specific layers of resnet34 for the imagenet classification problem, which are the last two. Our siamese net model need the encoder (resnet34 minus the image net specific head) and the siamese network specific head.

encoder = create_body(resnet34(),cut=-2)

head = create_head(512*2, 2, ps=0.5)

class SiameseModel(Module):

    def __init__(self, encoder, head):

        self.encoder,self.head = encoder,head

    def forward(self, x1, x2):

        ftrs = torch.cat([self.encoder(x1), self.encoder(x2)], dim=1)

        return self.head(ftrs)

To be able to freeze and unfreeze layers at our will, we need to create our own parameter splitter:

def siamese_splitter(model):
    return [params(model.encoder), params(model.head)]

Now we are ready to create our own learner:

learn = Learner(dls, model, loss_func=loss_func, 
                splitter=siamese_splitter, metrics=accuracy)

We will first freeze the pretrained weights and only learn for the head:

learn.freeze()

learn.fit_one_cycle(1, 0.005)

We will then unfreeze to train all layers with triangular learning rates.

learn.unfreeze()
learn.fit_one_cycle(50, slice(1e-6,1e-3))

Without any fine tunning and tricks that can be easily implemented as stated in the following post:

How to train SOTA computer vision models we reached >80% identifying 800 whales with less than 7 samples per whale on average (see below the snapshot after 50 epochs).

Concluding remarks

The midlevel API from fastai allows for customized data sets, such as the one required for a siamese network, being able to debug each step of the data creation and modeling. In many applications that would not be necessary, but as shown here with some object oriented programming knowledge one can get very strong results with less complexity and benefiting again from the high level functionalities of fastai.

Alan Fortuny Sicart

Search This Blog