Alan Fortuny Sicart

Posts

Showing posts from December, 2021

Convolutional Neural Networks (Part 3)

The previous parts could be directly plug in for image classification, or answering whether an object is or not in the image. If we want to know where the object is, or to detect multiple objects, we will need to adjust our networks to solve the localization problem. The intuition behind localization The idea is simple, on top of the class classification, we need to add the prediction of the center point of the object, the hight and width to create a bounding box of the object. Note that y will have then (coordinates center point (bx,by), height(bh), width(bw) and class (c), for each object. As you have seen, for every image, we will need to have 4 more data points than before. The l oss function dependts on the error, here we may want to use square error as we are not simply classifing the class. We would like to minimize the distance between the actual points and the predicted, when there is an object . When there is not an object, we can simply consider the classification error on...

Convolutional Neural Networks (Part 2)

In this section, we cover practical implementations of different convolutional deep neural networks, without covering each one, I find it particularly useful to point some of the patterns in most of the architectures: Architectures tend to increase the amount of layers, and hence the amount of parameters required. That has been possible due to the expansion of multiple GPU usage in both research and industry. The basic principles in layers remain, as most of the architectures start with convolution -activation -pooling series to end up with dense layers and ultimately a softmax for multiclass problems. Many architectures obtain rather great results with very small and simple convolutions and pooling layers while keeping vast channels. Resnets, or residual nets, and why do they work so well? Resnets are based on blocks called residual blocks. This blocks avoid some intermediate inputs to go through the whole network before it can be used to layers deep to the ri...

Convolutional Neural Networks (Part 1)

A great deal of the perception of our world come from our vision, and it could be fair to state than a great deal of communication depends of visual language too. The digitalization of many business makes product and services images one of the key touchpoints with consumers, and hence, a very important data point to manage our business and take decisions as consumers. Computer vision, is the science of making computers to process vision very much in the same way as us. In the blogpost, I am going to summarize the main learnings I got from the deep learning specialization, and more concretely, from the convolutional neural network course from https://www.deeplearning.ai. Understanding convolutions Convolutions are are the core of many computer vision algorithms. The idea is simple, in order to identify key feature such as vertical or horizontal edges, we can multiply with a filer/convolution each pixel to identify such edges. The following image shows how we identify the edges with...

How to structure Machine Learning Projects? (inspired in the deeplearning.ai deeplearning specialization)

Machine Learning projects, like any, could benefit from having proper frameworks. The heavy engineering design principles from Deep Learning, make it just more appropiate for that, so you can ask yourself the following , as a starting point: Is my model fitting well my train data set but not overfitting it? Are my predictions reliable in the test set? Can I be certain that my model will work in the real world? Block 1 - Core Principles, what we all need Single number evaluation metric : we need to have ideally one single number to agree if we are making progress on our long backlog of experiments. For classification problem could be the F1 score (https://towardsdatascience.com/the-f1-score-bec2bbc38aa6) or Mean Squared error for regression problems (https://medium.com/nothingaholic/understanding-the-mean-squared-error-df41e2c87958). Note that this does not include computation cost and biz metrics, a score considering ML, biz and computation/complexity would be ideal. ...