Introduction and key lessons learned
In the second lesson of the course practical deep learning for coders with fast ai, we get our hands dirty with our first image classifier on the data and topic we like. There are already many good learning and insights from this lesson I would like to share before I jumped into explaining the workflow:
- The first finding is that the following workflow works for a wide range of problems, I did get very good results classifying tree species, mushrooms types and running shoes.
- Second, we do not need to label the data manually, we can leverage from example the Bing API for that purpose,alhtough we need to be very accurate and specific with the search term
- Third, we can get very good results even with around 150 images per class, thanks to the fact that we are doing transfer learning and that we perform data augmentations on the data
- Four, one can increase their training size by a factor of 10 performing data augmentations that help to generalize the model better
- In order to clean the data, it is better to have a model first and then look at the cases where the model is least confident, a great deal of them are bad images or images wrongly labelled.
- We can store the model and use it in a application form creating buttons with ipywidgets, which allow non developers to experiment with our model.
- Despite trying quite hard for a full week, the tools proposed to create a web app : binder, viola, streamlit, did not work, my repo includes the requirements file, the app.yipnb file and the app.py to support app creation.Appreciate any input from others on this.
Probem statement
The very first part of the creating a classifier is to pick a potential application that you are interested in. From my work at Adidas, I was curious to see wheter such a fast model could already pick up key differences between brands and also be able to classify adidas, nike and puma models that were never trained on. The model I create could be used to craw properly competitor and internal products, in apps such as runntastic or for ranging positioning goals. Note that the probabilities given on a specific article can be interpreted as how genuinely different an article is from its competitors. As I want to explore the full fastai course, I decided to keeo the problem simple and focus only on three brands for that purpose, and three adidas franchises in another notebook. After one or two days of fine tunning, I achieve ~95% accuracy on the test data on brand classification and around 82% accuracy on the franchise classification. I only use one free GPU for that purpose, the code takes ~7 min tu run, and there was only a little label review on my end (20 misslabeled cases detected by the algorithm). This shows how fastai can be use for some standard image classification problems with very little code and computer, with some learnings that can betransfer to our image representations done at Adidas.
Gathering Data
Managing Data
Data Augmentations
- As we have "only" 150 images per class we need to find ways to increase our training data
- We want the model to generalize as good as possible
Training your model
Deploying your model
- search apis to increase data size or add competitor data to our pipelines
- simple data augmentation functions to increase the training data size with the same amount of images and improve generalization
- tranfer learning model to leverage pre trained models but fine tune for our own data in a few minutes
- strong data cleansing api to fixed label quality and model performance
- use ipywidgets to showcase our developments more interactively
Anwering the questions of the second chapter of the course
1.
Where do text models currently have a major
deficiency? Text models try to predict the most likely text or find text used
in similar context. It could be that the text created by the models is
realistic but very wrong. There are very strong results in translation, but in
generation there is still huge room for improvement.
2.
What are the possible societal implications for
text generation models? As generated text could look quite human, is quite easy
to spread missinformation at scale.
3.
In situations were a model might make mistakes,
and those mistakes could be harmful, what is a good alternative to automating a
process? Keeping a manual step and human supervision and tracking is important
to avoid harmful errors, also adding checks when the prediction is used in most
critical context (driving, diagnosis) versus less critical ones (movie or song
recommendations).
4.
What kind of tabular data is deep learning
particularly good at? Tabular data could benefit from deep learning when it is
very sparse (a lot of ordered levels)
and contains large texts.
5.
What’s a key downside of directly using a deep
learning model for recommendation systems? Deep learning can find a list of
potential products the consumer may like, but not necessarily which ones are
the most useful for a certain point in time.
6.
What are the steps of the DriveTrain approach?
The drive train approach allows you to ensure you got the key ingredients for a
successful deep learning application. You need to have clear objectives and
goals, understand what can the application affect, the data needed, and
train/deploy/track a model that could affect the objective function.
7.
How do the steps of the DriveTrain approach map
to a recommendation system?
a.
Goal: maximize customer life time value and
engagement (define as constant purchase value over time)
b.
Levers: email communication with consumers,
recommendation in the ecom page,
disccounts and invite to events
c.
Input : clickstream beahaviour, purchase
history, product features….
d.
Model : estimate customer lifetimevalue and
retention based on observed behaviour and interactions with consumers
8.
Create an image recognition model using data you
curate, and deloy it on the web. Here I will summarize what I am planning to
do. I would like to build an app that tells to whom is my daughter most similar
to within our family. We will gather pictures and classes on front face family
members photos to train and validate the model. In the app, the pretrained
model will be used with my daughter’s face picture and the inference will tell
which is the most likely family member.
9.
What is a DataLoader? A data loader is an fastai
object that allows you to load and process your data in a efficient way for
your deeplearning work. Depending on the data type (image, text, tabular…),
different data loaders should be used. Data loaders have multiple parameters to
allow the user to define the type of labels, input, transformations,
normalization …where both user define and predefined approaches can be used.
10.
What four things do we need to tell fast ai data
loaders?
a.
Type data
b.
How to list items
c.
How to label those items
d.
How to split the train and validation set
11.
What does the splitter parameter to DataBlock
do?
a.
Define the % of data for train/val and random
seed to reproduce results
12.
How do we ensure a random split always gives the
same validation set? Using the same seed.
13.
What letters are often used to signify the
dependent and independent variables? The dependent is normally Y and the
independent X.
14.
What’s the difference between the crop, pad, and
squish resize approaches? When might you choose one over the others?
a.
Crop: picks a portion of the original image
b.
Squish:
resize the image to the desired size
c.
Pad : will create zero on the area that has no
longer data after resizing
Crop
is adequate when loosing a part of the
image does not affect the capacity of the model to learn. Cropping dogs heads
could be very bad for classifying dogs, or for cancer detection it could be too
risky. When we have smaller objects or multiple details should be a problem.
Squish allow us to make sure all
the image space is used with content, at the expense of distortions.
Padding keeps the image original,
but adds unnecessary computation of pixels that now have zeros.
15.
What is data augmentation? Data augmentation is
a technique to increase the amount of samples and help the algorithm to
generalize as some alterations such as rotation, noise, cropping, color
modification are applied. It reduce the need for data and help the model to
generalize better for new data.
16.
Provide an example of where the bear
classification model might work poorly in production, due to structural or
style differences in the training data. The photos pulled from the bing api are
quite clear and the bear is clearly seen, while real applications might have
partially observed bears, with very bad light or moving objects. It is limited
to high resolution, daylight pictures with one single bear without many obects
in between.
17.
What is the difference between item transforms
and batch transforms? Item transforms resize all input images in the same way
as this is required for learning.
18.
What is a confusion matrix? Is a matrix that
allows you to see which classes you predict well and where are the errors
located.
19.
What does the export save? A pickle file with
the model we have trained.
20.
When we are making predictions instead of
training is called inference.
21.
What are IPython widgets? Is a library that
allow us to create apps with python code.
22.
CPU for deployment are better when iference does
not require to compute a large number of inputs. For large-low latency
inference, GPU may be a good option.
23.
What are the downsides of deploying your app in
the server instead of mobile? You need to have a running server always
available to provide fast feedback.
24.
Deploying a bear warning system requires a model
train probably on video data, with much lower quality, movement and many
objects in between
25.
Out of domain data is data where the model is
likely to perform poorly, as includes different aspects that makes inference
much harder.
26.
Domain shift occurs when the distribution and
behaviour of the input is changing signficantly, making past data obsolete or
less representative of the actual domain
27.
The deployment of deep learning should be first
mainly manual, later partially supervised and only when sufficient reliability
exists fully supervised.
Comments
Post a Comment