How hard is it to write a program that can recognize if a photo of a city was taken during the day or during the night?
Turns out that’s pretty simple. The process can be broken down in three steps:
- defining a feature vector;
- training a classifier;
- test the classifier on unknown images.
Defining a feature vector
A feature vector is a vector that summarize the features of an object that we want to classify.
Obviously photos taken by night have more dark pixel compared to photos taken by night. So we can the number of dark/midrange/light pixel as feature vector to classify photos.
However, simply counting the number of pixels for each color would make the feature vector dependent on the size of each photo. So, it makes more sense to put in the feature vector the ratio of pixels of each color by the total number of pixels.
An additional problem is that so far the feature vector would be larger than necessary: the RGB color space has \(\left(2^8\right)^3 = 16.777.216\) elements, so a city with a dark sky would be substantially different from a city whose sky has a slightly different shade of dark blue.
We can reduce the feature vector size by mapping each of the 256 possible values of each color channel to a smaller set of values, for example 4.
The final problem is that most machine learning libraries assume the feature vector to be a 1-dimensional vector, while the RGB color space is 3-dimensional. For this reason we can simply map the \(\)(4)^3 = 64\(\) cells of the RGB color space to a 1-dimensional vector with 64 slots.
Let’s use Pillow to read images and start writing some Python code that given an image file path or an image URL calculates its feature vector:
Just to have an idea of what we get, this is the feature vector plot of a city by day:
And this is the feature vector of a city by night:
Exactly as expected: night photos have plenty of dark pixels.
Another good quality of this approach is that feature values are already normalized, which makes most classifier work better.
Training a classifier
First, what is a classifier? It is a “thing” that given a feature vector returns its class. In our example, it should return “1” given the feature vector of a picture of a city by day, and “0” for a city by night. The procedure that teaches to the classifier what feature vectors belong to which classes is called training.
So go collect pictures of cities by day and by night, I’ll wait. Once you got them, put them in two separate folders. This will be our training set, that we’ll use to train a classifier.
Assume that we have no clue what machine learning algorithm we should use. scikit-learn provides a useful cheat sheet to guide us: http://scikit-learn.org/stable/tutorial/machine_learning_map/. In our case, it suggests that we should use a C-Support Vector Classification (also called Support Vector Machine, SVM).
In very simple terms, SVM puts all objects in the training set in a n-dimensional space (n can even be infinite!), and then looks for the plane that better divides objects of type A from objects of type B.
So far, this would work only if these objects are linearly separable. However, with one weird trick (mathematicians hate it) SVM work even on non-linearly separable classes (it’s actually called kernel trick).
SVC has plenty of parameters, the most important are C (penalty error), kernel (the type of kernel to use), and gamma (kernel coefficient).
Even if you have a good knowledge of SVM is not straightforward to choose these parameters. The simplest approach to solve this dilemma is simply to try all possible combinations of these parameters and pick the classifier that works best. scikit-learn automatizes this using the GridSearchCV class.
Putting together the pieces of the puzzle we have to:
- gather training data using the code showed in the previous section;
- define the parameter search space to find a good classifier
- return the classifier
Here’s the code that does it:
There are several other techniques to properly train a classifier, such as cross-validation. Read about them on the official scikit-learn documentation.
Test the classifier on unknown images
We got the training data, we got the classifier, we only need to test it:
And this is an example of how it works:
The classifier actually returned 1 for a photo taken by day and 0 for a picture taken by night!
Here is the full source code:
This binary classifier works quite well if you feed it with enough training data. Although as example I chose daytime vs. nighttime photos, it works for all images that have a reasonably different colorspaces, e.g. photos of tigers vs. elephants, landscapes vs. portraits, sea vs. meadow, and so on.
Moreover it is quite easy to modify it so that it works with multiple classes. Of course, this is left as an exercise to the reader.