FastAI Deep Learning Journey Part 1 : A first deep learning model to classify pets

FastAI for Deep Learning

Deep learning is crashing it, there are many applications such as feature extraction, product recommendations, cancer detection, fire detection that can be done with deep learning. Without stepping into the wording appropiatness, in these post series, I share my journey on fastai.

Fastai is a high level framework to train and deploy deeplearning models with little computation burden and very few lines of code. It is based on Pytorch which is the most popular DL framework as we speak. We are using deep learning for some years, but experimentation and deployment have taken far too long, and FastAI can be a great source to solve many problems. I particularly like the idea to have a framework that does not require a lot of computing and long learning curves.

Let me be very honest, I struggle a lot with setting the environment and getting to run the code, but I persisted and I can only share how worth is it.

You can find all the notebooks in the following repo:

https://github.com/afortuny/DeepLearningFastAI.git

Why you should use my repo?

It is sync with the latest developments of fastai (not like the course)
Every code snipped is commented and simplified
It works on fairly small instances
I add interesting analysis that help to understand better key concepts

Where did my code run?

Google collab one free 1GPU instance

How can I run the code? Please follow the instructions:

https://course.fast.ai/start_colab

Notes about my first deep learning model

In previous post I wrote about convolutional neural networks and transfer learning, if you are not familiar please take a look here: https://alanfortunysicart.blogspot.com/2021/12/convolutional-neural-networks-part-1.html

The goal here is to be able to manipulate image data, from now with some labels, and be able to create a classifier. The first notebook I review allow us leveraging fastai functions, to use a pretrained resnet34 on the ImageNet dataset and do transfer learning on a single GPU.

After only 4 epochs, we manage to get 96% of the 37 possible breed classes right, and almost a very balanced performance. Note that there are very nice functionalities to learn where the model is failing such as the plot top losses, plot confussion matrix, most confused.

Fast ai also allows to flexibly decide whether to use transfer learning (starting with pretrained weights) or train from scratch, and also to find the optimal learning rate. We show here that learning from scratch is a waste of time as even after 4 epochs we are not catching up with the transfer learned model. Using the optimal and increasing learning rate gives optimal results. The reasons from the latter is that the last layers capture the abstraction we want to classify, and hence those features we need to customize more.

Please use the following notebook to replicate my findings (the whole code took around 10min in google collab free version with 1 gpu):

https://github.com/afortuny/DeepLearningFastAI/blob/main/DL_Classifying_Pets_with_Fastai.ipynb

Answer to the questions of the first lesson

Do you need these for deep learning?
- Lots of math T / F basic linear algebra
- Lots of data T / F you can do transfer learning
- Lots of expensive computers T / F I am working in collab for free
- A PhD T / F common sense and perseverance
Name five areas where deep learning is now the best in the world. Classifying objects, language translation, Q&A, Chatbots, Predicting Sentiment...
What was the name of the first device that was based on the principle of the artificial neuron? Mark I Perceptron
Based on the book of the same name, what are the requirements for parallel distributed processing (PDP)?

A set of processing units
A state of activation
An output function for each unit
A pattern of connectivity among units
A propagation rule for propagating patterns of activities through the network of connectivities
An activation rule for combining the inputs impinging on a unit with the current state of that unit to produce an output for the unit
A learning rule whereby patterns of connectivity are modified by experience
An environment within which the system must operate

What were the two theoretical misunderstandings that held back the field of neural networks? First that only two layers were sufficient to model any theoretical function, it is the case that many more are needed. Second the lack of sufficient data and hardware explain poor performance both computationally and in accuracy.
What is a GPU? The graphics processing unit, or GPU, has become one of the most important types of computing technology, both for personal and business computing. Designed for parallel processing, the GPU is used in a wide range of applications, including graphics and video rendering. Although they’re best known for their capabilities in gaming, GPUs are becoming more popular for use in creative production and artificial intelligence (AI).

GPUs were originally designed to accelerate the rendering of 3D graphics. Over time, they became more flexible and programmable, enhancing their capabilities. This allowed graphics programmers to create more interesting visual effects and realistic scenes with advanced lighting and shadowing techniques. Other developers also began to tap the power of GPUs to dramatically accelerate additional workloads in high performance computing (HPC), deep learning, and more.
Why is it hard to use a traditional computer program to recognize images in a photo? Because of the poor processing power they have before. Only when GPUs were used we leverage large amounts of images at sufficiently great speeds.
What did Samuel mean by "weight assignment"? Weights are just variables, and a weight assignment is a particular choice of values for those variables. The program's inputs are values that it processes in order to produce its results—for instance, taking image pixels as inputs, and returning the classification "dog" as a result. The program's weight assignments are other values that define how the program will operate.
What term do we normally use in deep learning for what Samuel called "weights"? The term weights is reserved for a particular type of model parameter.
Draw a picture that summarizes Samuel's view of a machine learning model. Input - Model(weights) -Output
Why is it hard to understand why a deep learning model makes a particular prediction? Because theweights make little sense to us. They are heavily entangled between each other.
What is the name of the theorem that shows that a neural network can solve any mathematical problem to any level of accuracy? A mathematical proof called the universal approximation theorem shows that this function can solve any problem to any level of accuracy, in theory. The fact that neural networks are so flexible means that, in practice, they are often a suitable kind of model, and you can focus your effort on the process of training them—that is, of finding good weight assignments.
What do you need in order to train a model? input data, labels/output data, loss function, learning rate, gradients.
Do we always have to use 224×224-pixel images with the cat recognition model? This is the standard size for historical reasons (old pretrained models require this size exactly), but you can pass pretty much anything. If you increase the size, you'll often get a model with better results (since it will be able to focus on more details), but at the price of speed and memory consumption; the opposite is true if you decrease the size.
What will fastai do if you don't provide a validation set? fastai will always show you your model's accuracy using only the validation set, never the training set. This is absolutely critical, because if you train a large enough model for a long enough time, it will eventually memorize the label of every item in your dataset! The result will not actually be a useful model, because what we care about is how well our model works on previously unseen images. That is always our goal when creating a model: for it to be useful on data that the model only sees in the future, after it has been trained.
Can we always use a random sample for a validation set? Why or why not? not, think of times series data.
What is a metric? How does it differ from "loss"? a metric is anything you use to validate the model,the loss affect the learning of the model.
What is the "head" of a model? When using a pretrained model, cnn_learner will remove the last layer, since that is always specifically customized to the original training task (i.e. ImageNet dataset classification), and replace it with one or more new layers with randomized weights, of an appropriate size for the dataset you are working with. This last part of the model is known as the head.
What kinds of features do the early layers of a CNN find? How about the later layers? edges and cornes, and last abstractions and objects (cats, dogs...).
Are image models only useful for photos? maps, video , sketches can be used.
What is an "architecture"? the way the neural network relates the different layers, weights and activations. Also a general template for how that kind of model works internally
What is segmentation? the activity to classify the content of every individual pixel in an image is called segmentation
What is y_range used for? When do we need it? If we're predicting a continuous number, rather than a category, we have to tell fastai what range our target has, using the y_range parameter.
What are "hyperparameters"? parameters that affect what the model does which are not learn during training.
What's the best way to avoid failures when using AI in an organization?

Understand what deep learning can do well
Ensure there is data for the problem
Define the right loss function
Ensure the model does not overfit the data or underfit it
Choose the right architecture type
Leverage pre trained models when possible
Have a baseline
Have a business owner which will use the

Why is a GPU useful for deep learning? How is a CPU different, and why is it less effective for deep learning? We saw that the computationally intensive part of neural network is made up of multiple matrix multiplications. So how can we make it faster?We can simply do this by doing all the operations at the same time instead of doing it one after the other. This is in a nutshell why we use GPU (graphics processing units) instead of a CPU (central processing unit) for training a neural network.

Alan Fortuny Sicart

Search This Blog