Skip to main content

Why the present and future of image representation is self supervised (Part 3)

 Momentum Contrast for Unsupervised Visual Representation Learning(MOCOv2)

In the previous post, we present SIMCLR as one of the most promising self-supervised image representation methods. As the caviat of long training and large batch sizes could be too much for many applications, Mocov2 and its subsequent versions are both exploration.

Introduction

Self supervised methods in NLP are the norm, and almost how they have been built. Language tasks haven been made more and more efficient with the usage of dictionaries and keys on attention models, where memory is manage very efficiently on very large texts. 

The obvious continouous, highly dimensional nature of image data should not be a block for leveraging the learnings from NLP in self supervised tasks and representations in computer vision. Moco is a great proof of how it can be done.

If we reframe keys and tokens in the forms of group of images that are similar in the same query and disimilar to other queries we are exactly doing that.



Momentum contrast allows to build large dictionaries that encode queries an try to model properly the similarity between key. Think of the dictionary as a queue of samples that dynamically get updated. By slowly adapting the key encoder we are ensuring that learning and memory remains (old key encoding does not fade away).

MoCo provides representations that differ for negative augmentations of the anchor image and its augmentations.Using a pretext task very similar than in SIMCLR,we are able to get competitive results on the Image Net  and also real world data sts like Instagram data.


Method

The contrastive learning task at hand can be understood as an image dictionary lookup task.The contrastive loss function will be low for a positive key (same image but augmented) and high for the others keys (other images, with augmentations).

The expectation here is that a good representation comes from processing large dictionaries of negative samples where the encoder is updated slowly as sample passed. Older mini batches are not store and its learning is kept in the encoder as it changes slowly with new queries.

That have implications on the amount of memory required,as we do not store previous samples in a memory bank fashion, allowing for very large training data sets with less memory that other dictionary management approaches.




As in SIMCLR augmenations are performed as part ofthe algorithm and a Resnet model is used as an encoder,with a final global pulling layer of dimension 128 to encode the information of the image is a smaller form. Avoiding information leakage during the batch creation is a relevant as with SIMCLR, and for each GPU batch normalization is implemented.

Results


The second paper of the MoCo v2 provide direct comparison with SIMCLR and memory requirements:

Moco provides similar or better results with smaller batch sizes, epochs and memory requirements for GPU. This is why we go for Moco moving forward.



Comments

Popular posts from this blog

Degrowth Communism Strategy

Kohei Saito has published another book to make a valid point: any economic system that does not overcome capitalism will fail to reconcile social provisioning with planetary boundaries. The question is how democratic we want this system to be. He advocates radically democratizing the economic system and avoiding any form of climate Maoism, or a state dictatorship to enforce how we transition from capitalism. Let's see why, who, and also some strategic gaps I identified while reading the book, which I recommend. We need to reconcile socialism with ecology, and degrowth with socialism. Not all socialists agree or support degrowth or the notion of planetary boundaries, and definitely the mainstream left is rather green Keynesian, productivist, and mostly pro-growth. The author claims that due to the impossibility of sufficient decoupling and the need for capitalism to grow, only socialism and a break from capitalism can achieve a stable climate and public abundance. Also, not all degr