Why the present and future of image representation is self supervised (Part 2)

SIMCLR: simple framework for contrastive learning representations

As from the previous post, this is aimed to summarize the following paper, in the next post we cover moco and its benefits from a computational standpoint. Let's get started:

https://arxiv.org/pdf/2002.05709.pdf

Introduction

Viual representations can be done with generative (VAE or GANS) or discriminative methods (like SIMCLR or MOCO) when labels are not available.

Generative methods are expensive to compute, and have some issues as shared in the part 1 of this post.

In this paper we see how a discriminative approach, which replicates a supervised tasks on self generated labels is performing state of the art results. Not only that, but SIMCLR is also more intuitive, requiring a much simple architecture.

The major components of that framework are:

contrastive tasks based on data augmentations
the contrastive loss contains a non-linear component to improve the quality of the representation
large batch sizes and long time of training

This framework is able to compete with also deep supervised methods such as Resnet50 on 10/12 data sets evaluated.

Method

The objecive function in SIMCLR tries to maximize agreement with augmented (distorted) samples of the anchor image/s and disagreement with augmentations of the negative samples.

The framework provide an augmentation module, applying crops, rotation and random noise to the images. The encoding can be done using several networks like resnet and ultimative a projection layer map the representations to the contrastive/similarity loss function.

The selected data augmentations contain random crop and resize, color distorintions and gaussian blurring. A Resnet50 was used as an encoder, with a 2 layer MLP projection to create a 128 dimensional latent space vector right before the last projection layer.

The algorithm requires relative big batch sizes~8k samples which creates ~16k samples for each positive pair. To avoid the model that each batch contains only the same anchor, we need to avoid information leakage (the model picking our distribution logic and not learning representations) several methods such as global batch normalization, shuffling data across devices are required.

Evaluation has been done on famous data sets such as Imagenet and CIFAR, plus other data sets with labels for transfer learning. Note that the final results are done after a pretty heavy research was done on the optimal augmentation combination (read section 3 for more details). One of the conclusions from the paper is that to achieve great results, strong augmentation schemes are more important than in supervised tasks.

Results

SIMCLR provides a simple yet state of the art performance in image representation. Multiple data sets and tasks shows, that it could even compete with deep supervised methods on the same tasks such as Resnet50.

As the main limitation for its extensive use, note that the amount of training and batch size make it computationally expensive.

In the next post we explore the Moco framework as a good tradeoff between performance and computational time.

The following post goes on the same line of reasoning as ours, proposing MOCOv2(not in the benchmark above) as the lightweigh version:

https://medium.com/analytics-vidhya/simclr-with-less-computational-constraints-moco-v2-in-pytorch-3d8f3a8f8bf2

Radical Generosity: An Ecosocialist Manifesto

I have been a student of the climate crisis since 2016, initially focusing on its economics by reading mainstream work from environmental economists and the conventional economic analyses of climate change . Unsatisfied with their methods which are overly focused on monetary figures and too far removed from life-supporting systems, I found ecological economics to be a mindful transition aligned with planetary boundaries. Ecological economics provides tools to assess how much quantitative change is required and what the limits and impacts are, but it lacks guidance on how to get there, how to articulate a theory of change, and how to understand power dynamics . Political ecology and degrowth have helped me a lot, yet too little has been written on how to dismantle capitalism and democratize provisioning systems within planetary boundaries. That is why I came up with the idea of writing a book whose core combines class analysis and planetary boundaries, but which is also co...

Alan Fortuny Sicart

Search This Blog