Finding Differences in Images with PHP

I recently stumbled across a fascinating question: how could I tell whether an image had changed significantly? As PHP developers, the most troublesome image problem we have to deal with is how to resize an upload with an acceptable loss of quality.

Two similar images with hidden differences

In the end I discovered what many before me have – that this problem becomes relatively simple given the application of some fundamental mathematical principles. Come along with me as we learn about them…

You can find the code for this tutorial at https://github.com/undemanding/difference.

Bitmaps

There are two popular ways of thinking about images. The first is as a grid of individual pixels, composed of varying levels of color and contrast. Commonly, we break these colors down into their constituent red, green, and blue values. We could also think of them as hue, saturation, and lightness.

The second way of thinking about images is in terms of vectors. A line isn’t the pixels in between but rather a starting point and an ending point, with some meta data that describes a stroke in between. We’re going to focus on bitmaps because they’ll make the whole process easier.

We can break any image down into this bitmap grid, with code resembling:

$image = imagecreatefrompng($path);
$width = imagesx($image);
$height = imagesy($image);

$map = [];

for ($y = 0; $y < $height; $y++) {
    $map[$y] = [];

    for ($x = 0; $x < $width; $x++) {
        $color = imagecolorat($image, $x, $y);

        $map[$y][$x] = [
            "r" => ($color >> 16) & 0xFF,
            "g" => ($color >> 8) & 0xFF,
            "b" => $color & 0xFF
        ];
    }
}

Given the width and height of the image, we can use a function called imagecolorat (on an image resource) to get a single integer value for the red, green, and blue at that pixel. We can then use bit shifting and masking to get the individual values of each from the single integer value.

Each red, green, and blue value is in a range from 0 to 255. In binary, this range can be expressed as 00000000 to 11111111. A single integer value can represent 3 sets of these binary values and bit shifting is a way to get the binary values from the first, second, or third group of 8 bits.

If we create these grids from a couple of images; how can we compare them? We got the answer 2300 years ago…

Distance In Three Dimensions

Can you remember how to calculate the length of a line? Any line you can draw on paper can be thought of as the hypotenuse of a triangle (the long side). To measure it, we can square the horizontal and vertical sides of the right-angle triangle the hypotenuse makes, and work out their combined square root:

$start = [$x = 10, $y = 15];
$end = [$x = 20, $y = 30];

$width = $end[0] - $start[0];
$width *= $width;

$height = $end[1] - $start[1];
$height *= $height;

$distance = sqrt($width + $height); // ≈ 18.03

If the line was three-dimensional, we’d have to add a third component to the equation. There’s a general mathematical principle for this kind of distance measurement, called Euclidean distance. Older forms of it were called Pythagorean metric, because of the close relationship to the hypotenuse calculation we just did.

The formula expands to as many dimensions as we want it to, but we only need it for three:

$first = [$red = 100, $green = 125, $blue = 150];
$second = [$red = 125, $green = 150, $blue = 175];

$red = $second[0] - $first[0];
$red *= $red;

$green = $second[1] - $first[1];
$green *= $green;

$blue = $second[2] - $first[2];
$blue *= $blue;

$distance = sqrt($red + $green + $blue); // ≈ 43.30

We can apply this principle to every pixel of the bitmaps, until we have a third bitmap of just the differing values. Let’s try it…

Simple Image Differences

We can apply these principles with very little code. Let’s make a class to load images, create their bitmaps and calculate a map of pixel differences:

class State
{
    private $width;
    private $height;
    private $map = [];

    public function __construct($width, $height)
    {
        $this->width = $width;
        $this->height = $height;
    }
}

Each map has a defined width and height. To populate the map, we need to load images into memory:

private static function createImage($path)
{
    $image = null;

    $info = getimagesize($path);
    $type = $info[2];

    if ($type == IMAGETYPE_JPEG) {
        $image = imagecreatefromjpeg($path);
    }
    if ($type == IMAGETYPE_GIF) {
        $image = imagecreatefromgif($path);
    }
    if ($type == IMAGETYPE_PNG) {
        $image = imagecreatefrompng($path);
    }

    if (!$image) {
        throw new InvalidArgumentException("image invalid");
    }

    return $image;
}

We can use the GD image library to read a number of image formats. This function tries to load a file path that could be a JPEG, a GIF, or a PNG. If none of these work, we can just raise an exception. Why is this method static? We’re going to use it in another static method:

public static function fromImage($path)
{
    if (!file_exists($path)) {
        throw new InvalidArgumentException("image not found");
    }

    $image = static::createImage($path);
    $width = imagesx($image);
    $height = imagesy($image);

    $map = [];

    for ($y = 0; $y < $height; $y++) {
        $map[$y] = [];

        for ($x = 0; $x < $width; $x++) {
            $color = imagecolorat($image, $x, $y);

            $map[$y][$x] = [
                "r" => ($color >> 16) & 0xFF,
                "g" => ($color >> 8) & 0xFF,
                "b" => $color & 0xFF
            ];
        }
    }

    $new = new static($width, $height);
    $new->map = $map;

    return $new;
}

This static function allows us to create new image states (or maps) from a static call to State::fromImage("/path/to/image.png"). If the path doesn’t exist, we can raise another exception. Then we have the same image grid construction logic we saw previously. Finally, we create a new State, with a defined width, height and the map we constructed.

Now, let’s make a way to compare multiple images:

public function withDifference(State $state, callable $method)
{
    $map = [];

    for ($y = 0; $y < $this->height; $y++) {
        $map[$y] = [];

        for ($x = 0; $x < $this->width; $x++) {
            $map[$y][$x] = $method(
                $this->map[$y][$x],
                $state->map[$y][$x]
            );
        }
    }

    return $this->cloneWith("map", $map);
}

private function cloneWith($property, $value)
{
    $clone = clone $this;
    $clone->$property = $value;

    return $clone;
}

We go through each pixel, in each image. We pass them to a difference function and assign the resulting value to a new map. Finally, we create a clone of the current State, so it keeps the width and height, but sets a new state. This ensures we don’t modify existing State instances, but rather have access to a new instance with its own map.

What does that difference function look like?

class EuclideanDistance
{
    public function __invoke(array $p, array $q)
    {
        $r = $p["r"] - $q["r"];
        $r *= $r;

        $g = $p["g"] - $q["g"];
        $g *= $g;

        $b = $p["b"] - $q["b"];
        $b *= $b;

        return sqrt($r + $g + $b);
    }
}

It turns out we can use classes as functions, if we give them an __invoke method. In this class we can put the Euclidian distance logic we saw previously. We can put all of these pieces together like this:

$state1 = State::fromImage("/path/to/image1.png");
$state2 = State::fromImage("/path/to/image2.png");

$state3 = $state1->withDifference(
    $state2,
    new EuclideanDistance()
);

This method works great for images that are almost identical. When we try to work out the differences in very similar photos or even lossy versions of almost identical images, we’re presented with many slight differences.

Standard Deviation

To get around this problem, we need to remove the noise and focus only on the biggest problem. We can do that by working out how spread out the differences are. This measure of how spread out numbers are is called Standard Deviation.

Standard deviation graph

Image from Wikipedia

Most of the small differences between images are within the standard deviation (or the dark blue band between -1σ and 1σ). If we eliminate all the small differences within the standard deviation, then we should be left with the big differences. To work out which pixels are within the standard deviation, we need to work out the average pixel value:

public function average()
{
    $average = 0;

    for ($y = 0; $y < $this->height; $y++) {
        for ($x = 0; $x < $this->width; $x++) {
            if (!is_numeric($this->map[$y][$x])) {
                throw new LogicException("pixel is not numeric");
            }

            $average += $this->map[$y][$x];
        }
    }

    $average /= ($this->width * $this->height);

    return $average;
}

Averages are easy! Just add together everything and divide by the number of things you added together. An average of two things is their total value divided by two. An average of 400 x 300 things is their total value divided by 120,000.

Notice how we’re expecting numeric values during the calculation? That means we first need to generate the purely numeric state ($state3 in the above example), or a state generated by the EuclideanDistance difference function.

public function standardDeviation()
{
    $standardDeviation = 0;
    $average = $this->average();

    for ($y = 0; $y < $this->height; $y++) {
        for ($x = 0; $x < $this->width; $x++) {
            if (!is_numeric($this->map[$y][$x])) {
                throw new LogicException(
                    "pixel is not numeric"
                );
            }

            $delta = $this->map[$y][$x] - $average;
            $standardDeviation += ($delta * $delta);
        }
    }

    $standardDeviation /= (($this->width * $this->height) - 1);
    $standardDeviation = sqrt($standardDeviation);

    return $standardDeviation;
}

This calculation is a little similar to the distance calculation we did before. The basic idea is that we work out the average of all the pixels. We then work out how far each is from that average, and then work out the average of all of these distances.

We can apply this back to the State:

public function withReducedStandardDeviation()
{
    $map = array_slice($this->map, 0);
    $deviation = $this->standardDeviation();

    for ($y = 0; $y < $this->height; $y++) {
        for ($x = 0; $x < $this->width; $x++) {
            if (abs($map[$y][$x]) < $deviation) {
                $map[$y][$x] = 0;
            }
        }
    }

    return $this->cloneWith("map", $map);
}

This way, if differences are inside the standard deviation band, we exclude them from a new State. The result is a cleaner picture of changes.

public function boundary()
{
    $ax = $this->width;
    $bx = 0;
    $ay = $this->width;
    $by = 0;

    for ($y = 0; $y < $this->height; $y++) {
        for ($x = 0; $x < $this->width; $x++) {
            if ($this->map[$y][$x] > 0) {
                if ($x > $bx) {
                    $bx = $x;
                }

                if ($x < $ax) {
                    $ax = $x;
                }

                if ($y > $by) {
                    $by = $y;
                }

                if ($y < $ay) {
                    $ay = $y;
                }
            }
        }
    }

    if ($ax > $bx) {
        throw new LogicException("ax is greater than bx");
    }

    if ($ay > $by) {
        throw new LogicException("ay is greater than by");
    }

    $ax = ($ax / $this->width) * $this->width;
    $bx = ((($bx + 1) / $this->width) * $this->width) - $ax;
    $ay = ($ay / $this->height) * $this->height;
    $by = ((($by + 1) / $this->height) * $this->height) - $ay;

    return [
        "left" => $ax,
        "top" => $ay,
        "width" => $bx,
        "height" => $by
    ];
}

The final piece of the puzzle is a function to work out where the boundaries of the changes are. We can use this function as a way to draw a box around the changes from one image to another. The box starts on the edges of the grid and slowly moves inward until it reaches changes in the grid.

Conclusion

I started this experiment to find the differences between two images. You see, I wanted to take screenshots of an interface, during automated testing, and tell if something significant had changed.

Euclidian distance, applied to bitmaps, told me about every changed pixel. Then I wanted to allow for slight changes (like small text or colour changes), so I applied Standard Deviation noise removal, so that only significant changes would come through. Finally, I could work out exactly how many pixels were different as a percentage of the total number of pixels on the screen. I could tell if one screenshot was within 10% or 20% of a test fixture image.

Perhaps you have need of this for something wildly different. Perhaps you have ideas of how this can be improved. Let us know in the comments!

Frequently Asked Questions (FAQs) about Finding Differences in Images with PHP

How does PHP compare two images for similarity?

PHP uses the GD library and the Imagick extension to compare two images for similarity. The GD library is used to create and manipulate image files in a variety of different image formats including GIF, PNG, JPEG, WBMP, and XPM. On the other hand, the Imagick extension is a native PHP extension to create and modify images using the ImageMagick API. It provides methods for comparing images, including the compareImages() function which compares an image to a reconstructed image.

What is the role of the GD library in PHP image comparison?

The GD library in PHP is used to create and manipulate image files. It provides functions for loading images into memory, manipulating them, and saving them again. When comparing images, the GD library can be used to load the images, convert them to a common format if necessary, and then compare the pixel data.

How does the Imagick::compareImages() function work?

The Imagick::compareImages() function compares an image to a reconstructed image and returns an array of two images. The first image in the array is a reconstructed image, while the second one is the difference image, highlighting the areas where the two images differ. The function also returns the calculated mean error per pixel computed over the entire image.

Can PHP compare images of different formats?

Yes, PHP can compare images of different formats. The GD library provides functions for loading images in various formats, including GIF, PNG, JPEG, WBMP, and XPM. Once the images are loaded into memory, they can be compared regardless of their original format.

How can I use PHP to find differences in images?

You can use PHP to find differences in images by using the GD library or the Imagick extension. These provide functions for loading images into memory, manipulating them, and comparing them. The Imagick extension also provides a compareImages() function that returns a difference image highlighting the areas where the two images differ.

What is the ImageMagick API and how is it used in PHP?

ImageMagick is a software suite used to create, edit, and compose bitmap images. It can read, convert and write images in a variety of formats. The Imagick extension is a native PHP extension to create and modify images using the ImageMagick API. It provides methods for comparing images, including the compareImages() function.

Can PHP handle large images for comparison?

Yes, PHP can handle large images for comparison. However, the amount of memory required will depend on the size of the images. Large images may require a significant amount of memory to load into PHP’s memory space. You may need to increase the memory limit setting in your PHP configuration if you are working with very large images.

How accurate is PHP image comparison?

The accuracy of PHP image comparison depends on the methods used. The Imagick::compareImages() function, for example, calculates the mean error per pixel over the entire image, providing a quantitative measure of the difference between the images. However, it may not always reflect perceived visual differences, especially in complex images.

Can PHP compare more than two images at once?

PHP does not provide a built-in function to compare more than two images at once. However, you can write a custom function to do this. The function would need to load each image into memory and then compare them pair by pair.

How can I optimize the performance of PHP image comparison?

There are several ways to optimize the performance of PHP image comparison. One way is to reduce the size of the images before comparison. Smaller images require less memory to load and compare. Another way is to use efficient comparison methods. For example, the Imagick::compareImages() function is more efficient than comparing pixel data directly.