The previous parts could be directly plug in for image classification, or answering whether an object is or not in the image. If we want to know where the object is, or to detect multiple objects, we will need to adjust our networks to solve the localization problem. The intuition behind localization The idea is simple, on top of the class classification, we need to add the prediction of the center point of the object, the hight and width to create a bounding box of the object. Note that y will have then (coordinates center point (bx,by), height(bh), width(bw) and class (c), for each object. As you have seen, for every image, we will need to have 4 more data points than before. The l oss function dependts on the error, here we may want to use square error as we are not simply classifing the class. We would like to minimize the distance between the actual points and the predicted, when there is an object . When there is not an object, we can simply consider the classification error on
A critical but constructive blog about AI and post growth social systems