Skip to main content

Why the present and future of image representation is self supervised (Part 2)

 SIMCLR: simple framework for contrastive learning representations


As from the previous post, this is aimed to summarize the following paper, in the next post we cover moco and its benefits from a computational standpoint. Let's get started:


Introduction

Viual representations can be done with generative (VAE or GANS) or discriminative methods (like SIMCLR or MOCO) when labels are not available.

Generative methods are expensive to compute, and have some issues as shared in the part 1 of this post.
In this paper we see how a discriminative approach, which replicates a supervised tasks on self generated labels is performing state of the art results. Not only that, but SIMCLR is also more intuitive, requiring a much simple architecture.

The major components of that framework are:

  • contrastive tasks based on data augmentations
  • the contrastive loss contains a non-linear component to improve the quality of the representation
  • large batch sizes and long time of training
This framework is able to compete with also deep supervised methods such as Resnet50 on 10/12 data sets evaluated.

Method

The objecive function in SIMCLR tries to maximize agreement with augmented (distorted) samples of the anchor image/s and disagreement with augmentations of the negative samples.

The framework provide an augmentation module, applying crops, rotation and random noise to the images. The encoding can be done using several networks like resnet and ultimative a projection layer map the representations to the contrastive/similarity loss function. 











The selected data augmentations contain random crop and resize, color distorintions and gaussian blurring. A Resnet50 was used as an encoder, with a 2 layer MLP projection to create a 128 dimensional latent space vector right before the last projection layer.






The algorithm requires relative big batch sizes~8k samples which creates ~16k samples for each positive pair. To avoid the model that each batch contains only the same anchor, we need to avoid information leakage (the model picking our distribution logic and not learning representations) several methods such as global batch normalization, shuffling data across devices are required.






Evaluation has been done on famous data sets such as Imagenet and CIFAR, plus other data sets with labels for transfer learning.  Note that the final results are done after a pretty heavy research was done on the optimal augmentation combination (read section 3 for more details). One of the conclusions from the paper is that to achieve great results, strong augmentation schemes are more important than in supervised tasks.


Results

SIMCLR provides a simple yet state of the art performance in image representation. Multiple data sets and tasks shows, that it could even compete with deep supervised methods on the same tasks such as Resnet50.

As the main limitation for its extensive use, note that the amount of training and batch size make it computationally expensive.



In the next post we explore the Moco framework as a good tradeoff between performance and computational time.

The following post goes on the same line of reasoning as ours, proposing MOCOv2(not in the benchmark above) as the lightweigh version:


Comments

Popular posts from this blog

Degrowth Communism Strategy

Kohei Saito has published another book to make a valid point: any economic system that does not overcome capitalism will fail to reconcile social provisioning with planetary boundaries. The question is how democratic we want this system to be. He advocates radically democratizing the economic system and avoiding any form of climate Maoism, or a state dictatorship to enforce how we transition from capitalism. Let's see why, who, and also some strategic gaps I identified while reading the book, which I recommend. We need to reconcile socialism with ecology, and degrowth with socialism. Not all socialists agree or support degrowth or the notion of planetary boundaries, and definitely the mainstream left is rather green Keynesian, productivist, and mostly pro-growth. The author claims that due to the impossibility of sufficient decoupling and the need for capitalism to grow, only socialism and a break from capitalism can achieve a stable climate and public abundance. Also, not all degr