FastAI Deep Learning Journey Part 6: Protecting the Amazon rain forest with a multilabel image classifier

So far, we have been able to classify very accurately a range of multiclass problems, in the last post we cover with 85% validation set accuracy the detection of wild animals in very challenging conditions (movement, light, low quality). In this post we will show how is we can leverage fastai for problems where each image have multiple or no label, in the real case study of Amazon Satellite artifact detection we achieve 96% accuracy with only 3 epochs, 1GPU and 1 hour of training for 17 possible classes.

One of the first relevant messages to share is that multilabel classifiers are very generic, as they are not force to give us a class prediction for images which have a new class or none at all, making very appealing for production applications.

Other important point is that turning fastai learners from multiclass to multilabel just requires a change in the loss function and our metrics, making it very straighforward to use.

This section show some additional details about the following:

how to create a datablock from scratch on a real data set
which loss function is used for multilabel problems and why
how to use partial to inherit parameters from a function while creating a new one
how to define a threshold to make predictions maximizing overall accuracy

Understanding the problem we are trying to solve

The application in place aims to be able to assess the extend to which we can track change in land use on high definition satellite data (3-5 meter) to detect activities like cultivation, roads, settlements, but also natural elements such as rivers, forests and clouds. As one can image the same image will contain several elements and in very different forms, so the challenge is served. The winning model has been implemented to detect immediately deforestation and implement policies on it.

We are interested in the accuracy of getting all the expected labels on each image right, therefore we will consider a slightly different version of the multi/binary class accuracy, considering 1 when all labels are identify and 0 otherwise.

How did I create the data block

In the application I build, we have tens of thousands of images and a separate csv file indicating the different labels mapped with each image file. For the sake of the exercise and as the competition expire some years ago we focused only on the train data set and evaluate our model on the validation set.

In order to construct the datablock it is important to understand which components are needed:

two blocks, the Image Block (our x) and the MultCategory block (our y). Important to point out is that MultiCategory block performs one hot encoding on each label as this is the only way to measure the loss considering not the softmax result of one class, but the classes considered to be in the image. This is important to change to ensure the right loss function is choosen
get_x : that was the section that took me most of the time. Note we need a pathlib.Path indicating the folder name, in my case train-jpg, and indicate that from my data frame, the first column (0) contains the file name to use.
get_y: in this case we also look at the csv file converted as a pandas df, but this time the second column (1) as it contains our multiple labels, separated by space.
We resize all images to 224/224 and apply default augmentations at batch.

path = pathlib.Path('/content/gdrive/MyDrive/Amazon/kaggle/planet')

block = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),

                   get_x=ColReader(0, pref=path/"train-jpg"),

                   get_y=ColReader(1, label_delim=' '),

                   splitter=RandomSplitter(valid_pct=0.1, seed=42),

                   item_tfms=Resize(224),

                   batch_tfms=aug_transforms())

dls = block.dataloaders(df)

Note that we call dataloaders method implemented over df, our file with the filenames and the labels. This last step performs our DataBlock structure in bacthes for all our data, and split authomatically the data between training 80% and validation 20%. I keep the parameters to the minimum to show how a simple configuration can reach state of the art results.

Now we are ready to train the model, but first let's understand why we need another loss function.

Why do we need another loss function: Binary cross entropy

Contrary to the multiclass problem, we do not want to pick one class with the highest probability, and all probabilities to sum to zero. Instead, we want to know on which classes/labels the model is confident that they are in the image. So instead of using sigmoid and softmax to push for a winner we simply treat the problem as binary classification for each label, compute the log of the difference between the actual and the predicted, and average the results along the batch.

def binary_cross_entropy(inputs, targets):
    inputs = inputs.sigmoid()
    return -torch.where(targets==1, inputs, 1-inputs).log().mean()

FastAI will automatically used this loss as we create a multicategory block, but for those using pytorch directly the BCEWithLogitsLoss() is using also the binary cross entropy loss.

Why are we changing our metric

To make our predictions, we are potentially providing a prediction 1 from 0 to n labels, and therefore we cannot calculate accuracy with respect only one actual label. Here we need to define after applying the sigmoid function with which confidence threshold we predict 1. This threshold is a hyperparameter that will condition our accuracy metric.

def accuracy_multi(inp, targ, thresh=0.5, sigmoid=True):
    "Compute accuracy when `inp` and `targ` are the same size."
    if sigmoid: inp = inp.sigmoid()
    return ((inp>thresh)==targ.bool()).float().mean()

Training a model and selecting the treshold

In this post, the goal was to analyze the differences with respect multiclass problems, and hence we simply finetune a resnet50 for 3 epochs (with unfreeze weight in all layers) and before that 4 epochs with frozen weights for the layers where transfer learning applies. That acutally does not matter, you can choose other architectures and epochs to train.

learn = vision_learner(dls, resnet50, metrics=partial(accuracy_multi, thresh=0.2))
learn.fine_tune(3, base_lr=3e-3, freeze_epochs=4)

Please note that the loss function does not need to be explicitely stated, as fastai selects binary cross entropy by default for mutilabel blocks, and that we use partial to inherit the function accuracy multi with a predefined threshold of 0.2

This is giving us 95.84% accuracy on the validation set, took less than an hour with 1GPU for the full data set. Note that we use a learning rate without fine tunning, and also a threshold that could be optimized.

The following code allows you to find the optimal threshold on the validation set, which from our case is anything between 0.3 and 0.7, which is reasonable. As we got already very strong reasons I decide not to change the threshold, as I would like to report results without seeing a priori the performance on the validation set, what you see is the model configuration from the course, with no fine tunning from my side. As a curiosity, note that optimizing the threshold but have given us around 97%.

xs = torch.linspace(0.05,0.95,29)
accs = [accuracy_multi(preds, targs, thresh=i, sigmoid=False) for i in xs]
plt.plot(xs,accs);

Conclusions

Any real life application of image classification is potentially a multilabel problem , as we cannot be fully certain of:

the fact that only one label will be in the image
the fact that new/untrained labels are there
None of the train labels are in the image

Because of this reasons, and also the fact that we would want in many cases to provide predictions of multilabels, this is a very generic approach for image classification tasks.

We show how we can fastly create a dataloader/datablock to train a model with more than 95% accuracy within an hour and 1 GPU. There are only few changes in the Block definition, the loss function, the accuracy metric and the threshold definition that is required to successfully perform multilabel classification with state of the art results.

In the next post we will use fastai API to predict urban density as case study for image regression problems, stay tuned!

Links and Sources

Source Code: DeepLearningFastAI/06_multicat.ipynb at main · afortuny/DeepLearningFastAI · GitHub

Fastai: https://course.fast.ai/
Data Used : Planet Team (2017). Planet Application Program Interface: In Space for Life on Earth. San Francisco, CA. https://api.planet.com/.

Alan Fortuny Sicart

Search This Blog