Computer Vision : Visual representation with Sketches (Part 1)

Our physical reality, and our products, have more and more a digital counterpart. The production lead times of sports retail makes it challenging to have product samples when we need to encode visual product information for our advance analytics use cases, as we need to take decisions seasons before the product is in the store.

One of the safest source on information that we have at this point in time is product sketch, a realistic abstraction of how the product will look like when produced. Textures and some details may be lost, but a lots of attributes such as colors, silhouette, design elements, patterns, technologies can be picked up by computer vision deep learning networks. It is therefore mandatory to be able to extract visual embeddings of the highest quality for this digital source. On our own data that improve coverage 77% for future seasons (around 14k products could have visual embeddings)

In the following posts, I review what we understood as the most promissing approaches to generate robust representations of our products from a visual standpoint, independently of the availability of images, 3d renderings or sketches.

Can I reuse the same encoder as I build for my images?

The answer, as with most complex topics is, it depends. In our case, we tried out with a carefully curated data set where sketches were of high resolution, same orientation, no background .

For those cases we compare the cosine similarity between a Resnet50 and Resnet34 encoded representation between a sample of 60 pairs of images and its sketches. For this sample we achieve ~70% top 1 accuracy and ~85% in finding for each sketch the most similar image the actual. That was quite strong but when we scale for the whole data set of sketches >70K we got much weaker results.

The reasons are that the heterogeneity of sketches is too great (20% of the sketches contain other objects, have rotation, different levels of detail...) and many are quite far from the actual image, suggesting that we need train a model that could generalize well with a more complex set of sketch types. While preprocessing the skecthes did help, it has not been sufficient to reach the results obtained with the curated data set. We decide to investigate the specific modelling approaches for that type of problem.

Finding a good sketch to image Translator

The task at hand is therefore to find a good encoder that make every asset type domain comparable or ready to be translate to the other. One way to do that is to learn from existing sketch and image pairs so we can train and encoder of sketch that generate reallistic images, that our discrimators tries to pick up. This allow us to translate effectively any sketch into a reallistic image, from which we can get embeddings that can be used together with embeddings extracted directly from images.

Our first attempt is the pix2pix model, which follows the architecture shown above. As we are dealing with high resolution images, it is possible that the pix2pixHD will work better for us.

The papers and open source code from those models can be found below:

1611.07004.pdf (arxiv.org)

GitHub - junyanz/pytorch-CycleGAN-and-pix2pix: Image-to-Image Translation in PyTorch

1711.11585.pdf (arxiv.org)

In the following post, we will deep dive into the details of the method, as giving a brief introduction on GANNS.

Radical Generosity: An Ecosocialist Manifesto

I have been a student of the climate crisis since 2016, initially focusing on its economics by reading mainstream work from environmental economists and the conventional economic analyses of climate change . Unsatisfied with their methods which are overly focused on monetary figures and too far removed from life-supporting systems, I found ecological economics to be a mindful transition aligned with planetary boundaries. Ecological economics provides tools to assess how much quantitative change is required and what the limits and impacts are, but it lacks guidance on how to get there, how to articulate a theory of change, and how to understand power dynamics . Political ecology and degrowth have helped me a lot, yet too little has been written on how to dismantle capitalism and democratize provisioning systems within planetary boundaries. That is why I came up with the idea of writing a book whose core combines class analysis and planetary boundaries, but which is also co...

Alan Fortuny Sicart

Search This Blog