FastAI Deep Learning Journey Part 7: Calculating crowd size using image regression, a potential application for train use

In the previous post we show how to use a more general approach for the case when images may have one, multiple or any label at all. In this post, we will show how very little changes are required to implement computer vision deep learning methods for regression problems.

To make things less theoretical, we picked a very interesting data set, containing 2000 images from people in a shooping mall. Each picture has been carefully labeled, where we can find between 12 to 60 people. This could be a very interesting application for example for public transport usage, as the extension of monthly tickets may very hard to track the usage of each train/bus or other service in real life.

We will show that with only 3 epochs /1GPU we managed to get ~2 MAE (mean absolute error) or in order words, get the counting wrong 2+/- person, which is really not a lot considering we can have 60 people in the image. Let see in detail what needs to be changed and explain a potential usage for public transport usage tagging to plan the frequency and estimate demand.

Data assembly, very little or now change

In the case we analyzed, we can leverage a very similar data block, with some previous data wrangling to ensure the labels csv file and the image name are consistent.

After doing that we can create the following datablock:

def get_y(df): return df['label'] # to get labels labels

import pathlib 

path = pathlib.Path('/content/gdrive/MyDrive/Crowd')

block = DataBlock(blocks=(ImageBlock, RegressionBlock), #note we change to RegressionBlock

                   get_x=ColReader(0, pref=path/"frames"), #similar approach as with multilabel

                   get_y=get_y,

                   splitter=RandomSplitter(valid_pct=0.2, seed=42),

                   item_tfms=Resize(224),

                   batch_tfms=aug_transforms())

dls = block.dataloaders(df)

Note that the only main differences is that we create a regression block, but the rest remain almost identical. The following batch illustrate the challenge at hand.

Training a model with different metrics and loss function

As the task is of regression, it is expected that the loss function and the metrics are likely going to be different. For the loss function we lead fastai Regression block to suggest MSE and for the metric we use the MAE, as it is more interpretable.

Just out of curiosity, as it is a different loss function, we try to see if the learning rate has to be very different using the learning rate finder and apply it for 3 epochs.

One can see that we get after few epochs a fairly low mae ~2.4, which means we are getting +/- 2 people on average wrong, which is not more than 10% of the very low density cases. This is better than the currently blindness of usage that we have in most public transports without ticket validation or?

It is remarkable that we are able to move from classification to regression with such little changes and most importantly, with very little compromise on performance.

A potential application for public transport

As we speak, Germany created a monthly ticket for most public transport that does not required validation ( as long as the reviewer did not check, which could happen ~ 2/3 of the times).

This means that there is not way to know how many people are in a station, lane, train, bus at any given time. If we are planning to provide low price but high confort transportation, we need to match very well demand and supply.

If tickets like the 9euro ticket in germany or any other monthly tickets worldwide are used, together with inflation is energy but in general, higher use of the public transport is expected, and therefore more saturation at peak times.

To avoid delays, overcrowding and incorrect allocation of time frames, our model can be use to track in real time the usage, which could help to take short term and long term planning decisions. With sufficient visibility of the demand, we should be able to predict it better, or at least know the routes and times that require more capacity. This could be critical to improve the current public transport service and to ensure more population is using it as the main mode of transportation.

It is not sufficient to make it cheap, we need to make it reliable, confortable and safe, which requires very good demand and supply planning. Given the availability of cameras in lanes and many vehicles it should be possible to implement that in production.

I hope this post inspire public transport planners and policy makers to consider image regression as an ally for the affordable, punctual, safe, clean public transport we all want to have.

Alan Fortuny Sicart

Search This Blog