Why the present and future of image representation is self supervised (Part 3)

Momentum Contrast for Unsupervised Visual Representation Learning(MOCOv2)

In the previous post, we present SIMCLR as one of the most promising self-supervised image representation methods. As the caviat of long training and large batch sizes could be too much for many applications, Mocov2 and its subsequent versions are both exploration.

https://arxiv.org/pdf/1911.05722.pdf

Introduction

Self supervised methods in NLP are the norm, and almost how they have been built. Language tasks haven been made more and more efficient with the usage of dictionaries and keys on attention models, where memory is manage very efficiently on very large texts.

The obvious continouous, highly dimensional nature of image data should not be a block for leveraging the learnings from NLP in self supervised tasks and representations in computer vision. Moco is a great proof of how it can be done.

If we reframe keys and tokens in the forms of group of images that are similar in the same query and disimilar to other queries we are exactly doing that.

Momentum contrast allows to build large dictionaries that encode queries an try to model properly the similarity between key. Think of the dictionary as a queue of samples that dynamically get updated. By slowly adapting the key encoder we are ensuring that learning and memory remains (old key encoding does not fade away).

MoCo provides representations that differ for negative augmentations of the anchor image and its augmentations.Using a pretext task very similar than in SIMCLR,we are able to get competitive results on the Image Net and also real world data sts like Instagram data.

Method

The contrastive learning task at hand can be understood as an image dictionary lookup task.The contrastive loss function will be low for a positive key (same image but augmented) and high for the others keys (other images, with augmentations).

The expectation here is that a good representation comes from processing large dictionaries of negative samples where the encoder is updated slowly as sample passed. Older mini batches are not store and its learning is kept in the encoder as it changes slowly with new queries.

That have implications on the amount of memory required,as we do not store previous samples in a memory bank fashion, allowing for very large training data sets with less memory that other dictionary management approaches.

As in SIMCLR augmenations are performed as part ofthe algorithm and a Resnet model is used as an encoder,with a final global pulling layer of dimension 128 to encode the information of the image is a smaller form. Avoiding information leakage during the batch creation is a relevant as with SIMCLR, and for each GPU batch normalization is implemented.

Results

The second paper of the MoCo v2 provide direct comparison with SIMCLR and memory requirements:

Moco provides similar or better results with smaller batch sizes, epochs and memory requirements for GPU. This is why we go for Moco moving forward.

https://arxiv.org/pdf/2003.04297v1.pdf

Radical Generosity: An Ecosocialist Manifesto

I have been a student of the climate crisis since 2016, initially focusing on its economics by reading mainstream work from environmental economists and the conventional economic analyses of climate change . Unsatisfied with their methods which are overly focused on monetary figures and too far removed from life-supporting systems, I found ecological economics to be a mindful transition aligned with planetary boundaries. Ecological economics provides tools to assess how much quantitative change is required and what the limits and impacts are, but it lacks guidance on how to get there, how to articulate a theory of change, and how to understand power dynamics . Political ecology and degrowth have helped me a lot, yet too little has been written on how to dismantle capitalism and democratize provisioning systems within planetary boundaries. That is why I came up with the idea of writing a book whose core combines class analysis and planetary boundaries, but which is also co...

Alan Fortuny Sicart

Search This Blog