How hard is it to write a program that can recognize if a photo of a city was taken during the day or during the night?

Turns out that’s pretty simple. The process can be broken down in three steps:

  1. defining a feature vector;
  2. training a classifier;
  3. test the classifier on unknown images.

Defining a feature vector

A feature vector is a vector that summarize the features of an object that we want to classify.

Seattle by day by gaensler@flickr. Sydney by night by NickiMM@flickr.

Seattle by day by gaensler@flickr. Sydney by night by NickiMM@flickr.

Obviously photos taken by night have more dark pixel compared to photos taken by night. So we can the number of dark/midrange/light pixel as feature vector to classify photos.

However, simply counting the number of pixels for each color would make the feature vector dependent on the size of each photo. So, it makes more sense to put in the feature vector the ratio of pixels of each color by the total number of pixels.

An additional problem is that so far the feature vector would be larger than necessary: the RGB color space has \(\left(2^8\right)^3 = 16.777.216\) elements, so a city with a dark sky would be substantially different from a city whose sky has a slightly different shade of dark blue.

We can reduce the feature vector size by mapping each of the 256 possible values of each color channel to a smaller set of values, for example 4.

RGB color cube. Courtesy of SharkD@wikipedia. Original: https://en.wikipedia.org/wiki/RGB_color_space#mediaviewer/File:RGB_Cube_Show_lowgamma_cutout_b.png

RGB color cube. Courtesy of SharkD@wikipedia. Original: https://en.wikipedia.org/wiki/RGB_color_space#mediaviewer/File:RGB_Cube_Show_lowgamma_cutout_b.png

The final problem is that most machine learning libraries assume the feature vector to be a 1-dimensional vector, while the RGB color space is 3-dimensional. For this reason we can simply map the \(\)(4)^3 = 64\(\) cells of the RGB color space to a 1-dimensional vector with 64 slots.

Let’s use Pillow to read images and start writing some Python code that given an image file path or an image URL calculates its feature vector:

from __future__ import division
from __future__ import print_function
from PIL import Image
from StringIO import StringIO
import urllib2
from urlparse import urlparse
import sys
import os


def process_directory(directory):
    '''Returns an array of feature vectors for all the image files in a
    directory (and all its subdirectories). Symbolic links are ignored.

    Args:
      directory (str): directory to process.

    Returns:
      list of list of float: a list of feature vectors.
    '''
    training = []
    for root, _, files in os.walk(directory):
        for file_name in files:
            file_path = os.path.join(root, file_name)
            img_feature = process_image_file(file_path)
            if img_feature:
                training.append(img_feature)
    return training


def process_image_file(image_path):
    '''Given an image path it returns its feature vector.

    Args:
      image_path (str): path of the image file to process.

    Returns:
      list of float: feature vector on success, None otherwise.
    '''
    image_fp = StringIO(open(image_path, 'rb').read())
    try:
        image = Image.open(image_fp)
        return process_image(image)
    except IOError:
        return None


def process_image_url(image_url):
    '''Given an image URL it returns its feature vector

    Args:
      image_url (str): url of the image to process.

    Returns:
      list of float: feature vector.

    Raises:
      Any exception raised by urllib2 requests.

      IOError: if the URL does not point to a valid file.
    '''
    parsed_url = urlparse(image_url)
    request = urllib2.Request(image_url)
    # set a User-Agent and Referer to work around servers that block a typical
    # user agents and hotlinking. Sorry, it's for science!
    request.add_header('User-Agent', 'Mozilla/5.0 (X11; Ubuntu; Linux ' \
            'x86_64; rv:31.0) Gecko/20100101 Firefox/31.0')
    request.add_header('Referrer', parsed_url.netloc)
    # Wrap network data in StringIO so that it looks like a file
    net_data = StringIO(urllib2.build_opener().open(request).read())
    image = Image.open(net_data)
    return process_image(image)


def process_image(image, blocks=4):
    '''Given a PIL Image object it returns its feature vector.

    Args:
      image (PIL.Image): image to process.
      blocks (int, optional): number of block to subdivide the RGB space into.

    Returns:
      list of float: feature vector if successful. None if the image is not
      RGB.
    '''
    if not image.mode == 'RGB':
        return None
    feature = [0] * blocks * blocks * blocks
    pixel_count = 0
    for pixel in image.getdata():
        ridx = int(pixel[0]/(256/blocks))
        gidx = int(pixel[1]/(256/blocks))
        bidx = int(pixel[2]/(256/blocks))
        idx = ridx + gidx * blocks + bidx * blocks * blocks
        feature[idx] += 1
        pixel_count += 1
    return [x/pixel_count for x in feature]

Just to have an idea of what we get, this is the feature vector plot of a city by day:

Feature vector city by day

And this is the feature vector of a city by night:

Feature vector city by night

Exactly as expected: night photos have plenty of dark pixels.

Another good quality of this approach is that feature values are already normalized, which makes most classifier work better.

Training a classifier

I chose scikit-learn as machine learning library, but you are free to choose the one that excites you the most.

First, what is a classifier? It is a “thing” that given a feature vector returns its class. In our example, it should return “1” given the feature vector of a picture of a city by day, and “0” for a city by night. The procedure that teaches to the classifier what feature vectors belong to which classes is called training.

So go collect pictures of cities by day and by night, I’ll wait. Once you got them, put them in two separate folders. This will be our training set, that we’ll use to train a classifier.

Assume that we have no clue what machine learning algorithm we should use. scikit-learn provides a useful cheat sheet to guide us: http://scikit-learn.org/stable/tutorial/machine_learning_map/. In our case, it suggests that we should use a C-Support Vector Classification (also called Support Vector Machine, SVM).

SVM are all-around awesome and are  one of those algorithms that is pretty much always worth trying because they work well in a wide range of settings. Go read about them.

In very simple terms, SVM puts all objects in the training set in a n-dimensional space (n can even be infinite!), and then looks for the plane that better divides objects of type A from objects of type B.

So far, this would work only if these objects are linearly separable. However, with one weird trick (mathematicians hate it) SVM work even on non-linearly separable classes (it’s actually called kernel trick).

The SVM implementation of scikit-learn is available sklearn.svc module. As per cheat sheet suggestion, we are going to use SVC.

SVC has plenty of parameters, the most important are C (penalty error), kernel (the type of kernel to use), and gamma (kernel coefficient).

Even if you have a good knowledge of SVM is not straightforward to choose these parameters. The simplest approach to solve this dilemma is simply to try all possible combinations of these parameters and pick the classifier that works best. scikit-learn automatizes this using the GridSearchCV class.

Putting together the pieces of the puzzle we have to:

  1. gather training data using the code showed in the previous section;
  2. define the parameter search space to find a good classifier
  3. return the classifier

Here’s the code that does it:

def train(training_path_a, training_path_b, print_metrics=True):
    '''Trains a classifier. training_path_a and training_path_b should be
    directory paths and each of them should not be a subdirectory of the other
    one. training_path_a and training_path_b are processed by
    process_directory().

    Args:
      training_path_a (str): directory containing sample images of class A.
      training_path_b (str): directory containing sample images of class B.
      print_metrics  (boolean, optional): if True, print statistics about
        classifier performance.

    Returns:
      A classifier (sklearn.svm.SVC).
    '''
    if not os.path.isdir(training_path_a):
        raise IOError('%s is not a directory' % training_path_a)
    if not os.path.isdir(training_path_b):
        raise IOError('%s is not a directory' % training_path_b)
    training_a = process_directory(training_path_a)
    training_b = process_directory(training_path_b)
    # data contains all the training data (a list of feature vectors)
    data = training_a + training_b
    # target is the list of target classes for each feature vector: a '1' for
    # class A and '0' for class B
    target = [1] * len(training_a) + [0] * len(training_b)
    # split training data in a train set and a test set. The test set will
    # containt 20% of the total
    x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,
            target, test_size=0.20)
    # define the parameter search space
    parameters = {'kernel': ['linear', 'rbf'], 'C': [1, 10, 100, 1000],
            'gamma': [0.01, 0.001, 0.0001]}
    # search for the best classifier within the search space and return it
    clf = grid_search.GridSearchCV(svm.SVC(), parameters).fit(x_train, y_train)
    classifier = clf.best_estimator_
    if print_metrics:
        print()
        print('Parameters:', clf.best_params_)
        print()
        print('Best classifier score')
        print(metrics.classification_report(y_test,
            classifier.predict(x_test)))
    return classifier

There are several other techniques to properly train a classifier, such as cross-validation. Read about them on the official scikit-learn documentation.

Test the classifier on unknown images

We got the training data, we got the classifier, we only need to test it:

def main(training_path_a, training_path_b):
    '''Main function. Trains a classifier and allows to use it on images
    downloaded from the Internet.

    Args:
      training_path_a (str): directory containing sample images of class A.
      training_path_b (str): directory containing sample images of class B.
    '''
    print('Training classifier...')
    classifier = train(training_path_a, training_path_b)
    while True:
        try:
            print("Input an image url (enter to exit): "),
            image_url = raw_input()
            if not image_url:
                break
            features = process_image_url(image_url)
            print(classifier.predict(features))
        except (KeyboardInterrupt, EOFError):
            break
        except:
            exception = sys.exc_info()[0]
            print(exception)

And this is an example of how it works:

Training classifier...

Parameters: {'kernel': 'linear', 'C': 10, 'gamma': 0.01}

Best classifier score
             precision    recall  f1-score   support

          0       1.00      1.00      1.00         3
          1       1.00      1.00      1.00         5

avg / total       1.00      1.00      1.00         8


Input an image url (enter to exit): 
https://upload.wikimedia.org/wikipedia/commons/9/99/Qu%C3%A9bec-City-Skyline.jpg
[1]
Input an image url (enter to exit): 
http://upload.wikimedia.org/wikipedia/commons/d/d4/New_York_City_at_night_HDR_edit1.jpg
[0]

Yay!

The classifier actually returned 1 for a photo taken by day and 0 for a picture taken by night!

Here is the full source code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''Images binary classifier based on scikit-learn SVM classifier.
It uses the RGB color space as feature vector.
'''

from __future__ import division
from __future__ import print_function
from PIL import Image
from sklearn import cross_validation
from sklearn import grid_search
from sklearn import svm
from sklearn import metrics
from StringIO import StringIO
from urlparse import urlparse
import urllib2
import sys
import os


def process_directory(directory):
    '''Returns an array of feature vectors for all the image files in a
    directory (and all its subdirectories). Symbolic links are ignored.

    Args:
      directory (str): directory to process.

    Returns:
      list of list of float: a list of feature vectors.
    '''
    training = []
    for root, _, files in os.walk(directory):
        for file_name in files:
            file_path = os.path.join(root, file_name)
            img_feature = process_image_file(file_path)
            if img_feature:
                training.append(img_feature)
    return training


def process_image_file(image_path):
    '''Given an image path it returns its feature vector.

    Args:
      image_path (str): path of the image file to process.

    Returns:
      list of float: feature vector on success, None otherwise.
    '''
    image_fp = StringIO(open(image_path, 'rb').read())
    try:
        image = Image.open(image_fp)
        return process_image(image)
    except IOError:
        return None


def process_image_url(image_url):
    '''Given an image URL it returns its feature vector

    Args:
      image_url (str): url of the image to process.

    Returns:
      list of float: feature vector.

    Raises:
      Any exception raised by urllib2 requests.

      IOError: if the URL does not point to a valid file.
    '''
    parsed_url = urlparse(image_url)
    request = urllib2.Request(image_url)
    # set a User-Agent and Referer to work around servers that block a typical
    # user agents and hotlinking. Sorry, it's for science!
    request.add_header('User-Agent', 'Mozilla/5.0 (X11; Ubuntu; Linux ' \
            'x86_64; rv:31.0) Gecko/20100101 Firefox/31.0')
    request.add_header('Referrer', parsed_url.netloc)
    # Wrap network data in StringIO so that it looks like a file
    net_data = StringIO(urllib2.build_opener().open(request).read())
    image = Image.open(net_data)
    return process_image(image)


def process_image(image, blocks=4):
    '''Given a PIL Image object it returns its feature vector.

    Args:
      image (PIL.Image): image to process.
      blocks (int, optional): number of block to subdivide the RGB space into.

    Returns:
      list of float: feature vector if successful. None if the image is not
      RGB.
    '''
    if not image.mode == 'RGB':
        return None
    feature = [0] * blocks * blocks * blocks
    pixel_count = 0
    for pixel in image.getdata():
        ridx = int(pixel[0]/(256/blocks))
        gidx = int(pixel[1]/(256/blocks))
        bidx = int(pixel[2]/(256/blocks))
        idx = ridx + gidx * blocks + bidx * blocks * blocks
        feature[idx] += 1
        pixel_count += 1
    return [x/pixel_count for x in feature]


def show_usage():
    '''Prints how to use this program
    '''
    print("Usage: %s [class A images directory] [class B images directory]" %
            sys.argv[0])
    sys.exit(1)


def train(training_path_a, training_path_b, print_metrics=True):
    '''Trains a classifier. training_path_a and training_path_b should be
    directory paths and each of them should not be a subdirectory of the other
    one. training_path_a and training_path_b are processed by
    process_directory().

    Args:
      training_path_a (str): directory containing sample images of class A.
      training_path_b (str): directory containing sample images of class B.
      print_metrics  (boolean, optional): if True, print statistics about
        classifier performance.

    Returns:
      A classifier (sklearn.svm.SVC).
    '''
    if not os.path.isdir(training_path_a):
        raise IOError('%s is not a directory' % training_path_a)
    if not os.path.isdir(training_path_b):
        raise IOError('%s is not a directory' % training_path_b)
    training_a = process_directory(training_path_a)
    training_b = process_directory(training_path_b)
    # data contains all the training data (a list of feature vectors)
    data = training_a + training_b
    # target is the list of target classes for each feature vector: a '1' for
    # class A and '0' for class B
    target = [1] * len(training_a) + [0] * len(training_b)
    # split training data in a train set and a test set. The test set will
    # containt 20% of the total
    x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,
            target, test_size=0.20)
    # define the parameter search space
    parameters = {'kernel': ['linear', 'rbf'], 'C': [1, 10, 100, 1000],
            'gamma': [0.01, 0.001, 0.0001]}
    # search for the best classifier within the search space and return it
    clf = grid_search.GridSearchCV(svm.SVC(), parameters).fit(x_train, y_train)
    classifier = clf.best_estimator_
    if print_metrics:
        print()
        print('Parameters:', clf.best_params_)
        print()
        print('Best classifier score')
        print(metrics.classification_report(y_test,
            classifier.predict(x_test)))
    return classifier


def main(training_path_a, training_path_b):
    '''Main function. Trains a classifier and allows to use it on images
    downloaded from the Internet.

    Args:
      training_path_a (str): directory containing sample images of class A.
      training_path_b (str): directory containing sample images of class B.
    '''
    print('Training classifier...')
    classifier = train(training_path_a, training_path_b)
    while True:
        try:
            print("Input an image url (enter to exit): "),
            image_url = raw_input()
            if not image_url:
                break
            features = process_image_url(image_url)
            print(classifier.predict(features))
        except (KeyboardInterrupt, EOFError):
            break
        except:
            exception = sys.exc_info()[0]
            print(exception)


if __name__ == '__main__':
    if len(sys.argv) != 3:
        show_usage()
    main(sys.argv[1], sys.argv[2])

Wrap up

This binary classifier works quite well if you feed it with enough training data. Although as example I chose daytime vs. nighttime photos, it works for all images that have a reasonably different colorspaces, e.g. photos of tigers vs. elephants, landscapes vs. portraits, sea vs. meadow, and so on.

Moreover it is quite easy to modify it so that it works with multiple classes. Of course, this is left as an exercise to the reader.