FastAI Deep Learning journey part 2: Creating a brand classifier

Introduction and key lessons learned

In the second lesson of the course practical deep learning for coders with fast ai, we get our hands dirty with our first image classifier on the data and topic we like. There are already many good learning and insights from this lesson I would like to share before I jumped into explaining the workflow:

The first finding is that the following workflow works for a wide range of problems, I did get very good results classifying tree species, mushrooms types and running shoes.
Second, we do not need to label the data manually, we can leverage from example the Bing API for that purpose,alhtough we need to be very accurate and specific with the search term
Third, we can get very good results even with around 150 images per class, thanks to the fact that we are doing transfer learning and that we perform data augmentations on the data
Four, one can increase their training size by a factor of 10 performing data augmentations that help to generalize the model better
In order to clean the data, it is better to have a model first and then look at the cases where the model is least confident, a great deal of them are bad images or images wrongly labelled.
We can store the model and use it in a application form creating buttons with ipywidgets, which allow non developers to experiment with our model.
Despite trying quite hard for a full week, the tools proposed to create a web app : binder, viola, streamlit, did not work, my repo includes the requirements file, the app.yipnb file and the app.py to support app creation.Appreciate any input from others on this.

In the following notebook one can see the detailed explainations and code to create your own classification and turned into an app:

https://github.com/afortuny/DeepLearningFastAI/blob/main/02_production.ipynb

If you replicate the code note that you will need a azure bindgsearch api key,which I explain how to do in the notebook.

Probem statement

The very first part of the creating a classifier is to pick a potential application that you are interested in. From my work at Adidas, I was curious to see wheter such a fast model could already pick up key differences between brands and also be able to classify adidas, nike and puma models that were never trained on. The model I create could be used to craw properly competitor and internal products, in apps such as runntastic or for ranging positioning goals. Note that the probabilities given on a specific article can be interpreted as how genuinely different an article is from its competitors. As I want to explore the full fastai course, I decided to keeo the problem simple and focus only on three brands for that purpose, and three adidas franchises in another notebook. After one or two days of fine tunning, I achieve ~95% accuracy on the test data on brand classification and around 82% accuracy on the franchise classification. I only use one free GPU for that purpose, the code takes ~7 min tu run, and there was only a little label review on my end (20 misslabeled cases detected by the algorithm). This shows how fastai can be use for some standard image classification problems with very little code and computer, with some learnings that can betransfer to our image representations done at Adidas.

Gathering Data

That was the first time I interact with the bing image search API, and I am very impressed. As long as one type very concrete search terms, such as product model |animal specie | car model , one should expected to be able to gather high quality (althought diverse) images correctly labeled. I can only encourage to use that both for research and commercial purposes, as many times one can find himself lacking the necessary data for a use case. The service is free for small scale request, but it is absolutely worth is your use case depend on having such data.

Managing Data

Image data is highly dimensional (one have 3 alpha channels and many pixels), and many times heterogeneous, as one have very different resolutions for each image. Fastai uses data loaders, that allows to process properly your images to make them ready for training and inference.

The data loader we use here pull the images and get the labels as we specify (that is specific to your image source, but can leverage the same code for bing pulled images), separate the data into training and validation and resize all the images to the same size.

In the jupyter notebook one can find explanations of what each line of code means and how to download locally the data, but one can already explore visually the images pulled and the labels like that:

Data Augmentations

The following section is essential for two reasons:

As we have "only" 150 images per class we need to find ways to increase our training data
We want the model to generalize as good as possible

With these two goals in mind, we will performed the recommended augmentations to our data set such as rotation, cropping and flop, which increases by a factor of 8 our training data size.

Training your model

In the following, the model is trained for several epochs using at each batch different augmented versions of the same images, hence learning different views of the same product. I have trained the model twice, the first one around 4 epochs, to have a baseline model from which to detect potential misslabeled data, and second for 10-20 epochs to get a very accurate model. Not that during the first iteration I achieve without cleaning around 80% of accuracy, but exploring the cases where the model was not confident I spotted that specially with adidas adizero products there were actually many nike products. The amazing fact is that the model was almost always right at showing that those cases were not adidas products. I used the following api for cleansing:

The image displayed was taken after cleaning on the franchise classifier, you will not spot issues as I already clean up the misslabeled cases. What is interesting here is that one can go each class for both the training and the validation set and reassing or delete image that are wrongly labeled or out of domain (showing a shirt for example).

After doing the cleaning at training for 10 epochs we read 92% of validation accuracy, which has hold on the test data set:

Deploying your model

Once I store the model as a pickle file, I use ipywidgets to create buttons that allow any non technical user to test the algorithm with new shoes. The result could be used as a simple brand classifier app as follows:

To make it run, one have to click upload button and the code will pick the image, resize it and provide you the most likely class and the confidence in terms of probability. Note that the labels are based on the bing search (I train the model only on adidas adios shoes) but it was able to find that the following addias agravic shoes is from adidas, remarkable or? I tried with many other models are it gets it right most of the time. For puma and nike shoes that are very similar to adidas, it still classify them correct by the confidence of the model get lower.

The workflow I show have many key components that can be used for out real applications:

search apis to increase data size or add competitor data to our pipelines
simple data augmentation functions to increase the training data size with the same amount of images and improve generalization
tranfer learning model to leverage pre trained models but fine tune for our own data in a few minutes
strong data cleansing api to fixed label quality and model performance
use ipywidgets to showcase our developments more interactively

The next section includes my personal answer to the questions raised during the course.

Anwering the questions of the second chapter of the course

1. Where do text models currently have a major deficiency? Text models try to predict the most likely text or find text used in similar context. It could be that the text created by the models is realistic but very wrong. There are very strong results in translation, but in generation there is still huge room for improvement.

2. What are the possible societal implications for text generation models? As generated text could look quite human, is quite easy to spread missinformation at scale.

3. In situations were a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process? Keeping a manual step and human supervision and tracking is important to avoid harmful errors, also adding checks when the prediction is used in most critical context (driving, diagnosis) versus less critical ones (movie or song recommendations).

4. What kind of tabular data is deep learning particularly good at? Tabular data could benefit from deep learning when it is very sparse (a lot of ordered levels) and contains large texts.

5. What’s a key downside of directly using a deep learning model for recommendation systems? Deep learning can find a list of potential products the consumer may like, but not necessarily which ones are the most useful for a certain point in time.

6. What are the steps of the DriveTrain approach? The drive train approach allows you to ensure you got the key ingredients for a successful deep learning application. You need to have clear objectives and goals, understand what can the application affect, the data needed, and train/deploy/track a model that could affect the objective function.

7. How do the steps of the DriveTrain approach map to a recommendation system?

a. Goal: maximize customer life time value and engagement (define as constant purchase value over time)

b. Levers: email communication with consumers, recommendation in the ecom page, disccounts and invite to events

c. Input : clickstream beahaviour, purchase history, product features….

d. Model : estimate customer lifetimevalue and retention based on observed behaviour and interactions with consumers

8. Create an image recognition model using data you curate, and deloy it on the web. Here I will summarize what I am planning to do. I would like to build an app that tells to whom is my daughter most similar to within our family. We will gather pictures and classes on front face family members photos to train and validate the model. In the app, the pretrained model will be used with my daughter’s face picture and the inference will tell which is the most likely family member.

9. What is a DataLoader? A data loader is an fastai object that allows you to load and process your data in a efficient way for your deeplearning work. Depending on the data type (image, text, tabular…), different data loaders should be used. Data loaders have multiple parameters to allow the user to define the type of labels, input, transformations, normalization …where both user define and predefined approaches can be used.

10. What four things do we need to tell fast ai data loaders?

a. Type data

b. How to list items

c. How to label those items

d. How to split the train and validation set

11. What does the splitter parameter to DataBlock do?

a. Define the % of data for train/val and random seed to reproduce results

12. How do we ensure a random split always gives the same validation set? Using the same seed.

13. What letters are often used to signify the dependent and independent variables? The dependent is normally Y and the independent X.

14. What’s the difference between the crop, pad, and squish resize approaches? When might you choose one over the others?

a. Crop: picks a portion of the original image

b. Squish: resize the image to the desired size

c. Pad : will create zero on the area that has no longer data after resizing

Crop is adequate when loosing a part of the image does not affect the capacity of the model to learn. Cropping dogs heads could be very bad for classifying dogs, or for cancer detection it could be too risky. When we have smaller objects or multiple details should be a problem.

Squish allow us to make sure all the image space is used with content, at the expense of distortions.

Padding keeps the image original, but adds unnecessary computation of pixels that now have zeros.

15. What is data augmentation? Data augmentation is a technique to increase the amount of samples and help the algorithm to generalize as some alterations such as rotation, noise, cropping, color modification are applied. It reduce the need for data and help the model to generalize better for new data.

16. Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data. The photos pulled from the bing api are quite clear and the bear is clearly seen, while real applications might have partially observed bears, with very bad light or moving objects. It is limited to high resolution, daylight pictures with one single bear without many obects in between.

17. What is the difference between item transforms and batch transforms? Item transforms resize all input images in the same way as this is required for learning.

18. What is a confusion matrix? Is a matrix that allows you to see which classes you predict well and where are the errors located.

19. What does the export save? A pickle file with the model we have trained.

20. When we are making predictions instead of training is called inference.

21. What are IPython widgets? Is a library that allow us to create apps with python code.

22. CPU for deployment are better when iference does not require to compute a large number of inputs. For large-low latency inference, GPU may be a good option.

23. What are the downsides of deploying your app in the server instead of mobile? You need to have a running server always available to provide fast feedback.

24. Deploying a bear warning system requires a model train probably on video data, with much lower quality, movement and many objects in between

25. Out of domain data is data where the model is likely to perform poorly, as includes different aspects that makes inference much harder.

26. Domain shift occurs when the distribution and behaviour of the input is changing signficantly, making past data obsolete or less representative of the actual domain

27. The deployment of deep learning should be first mainly manual, later partially supervised and only when sufficient reliability exists fully supervised.

Alan Fortuny Sicart

Search This Blog