🤯 50% Off! 700+ courses, assessments, and books

Building Microsoft’s What-Dog AI in under 100 Lines of Code

Bruno Skvorc
Share

Rather recently, Microsoft released an app using AI to detect a dog’s breed. When I tested it on my beagle, though…

The app identifies a beagle as a Saluki

Hmm, not quite, app. Not quite.

In my non-SitePoint time, I also work for Diffbot – the startup you may have heard of over the past few weeks – who also dabble in AI. To test how they compare, in this tutorial we’ll recreate Microsoft’s application using Diffbot’s technology to see if it does a better job at recognizing the adorable beasts we throw at it!

We’ll build a very primitive single-file “app” for uploading images and outputting the information about the breed under the form.

Prerequisites

If you’d like to follow along, please register for a free 14-day token at Diffbot.com, if you don’t have an account there yet.

To install the client, we use the following composer.json file:

{
    "require": {
        "swader/diffbot-php-client": "^2",
        "php-http/guzzle6-adapter": "^1.0"
    },
    "minimum-stability": "dev",
    "prefer-stable": true,
    "require-dev": {
        "symfony/var-dumper": "^3.0"
    }
}

Then, we run composer install.

The minimum stability flag is there because a part of the Puli package is still in beta, and it’s a dependency of the PHP HTTP project now. The prefer stable directive is there to make sure the highest stable version of a package is used if available. We also need an HTTP client, and in this case I opted for Guzzle6, though the Diffbot PHP client supports any modern HTTP client via Httplug, so feel free to use your own favorite.

Once these items have been installed, we can create an index.php file, which will contain all of our application’s logic. But first, bootstrapping:

<?php

require 'vendor/autoload.php';

$token = 'my_token';

The Upload

Let’s build a primitive upload form above the PHP content of our index.php file.

<form action="/" method="post" enctype="multipart/form-data">
    <h2>Please either paste in a link to the image, or upload the image directly.</h2>
    <h3>URL</h3>
    <input type="text" name="url" id="url" placeholder="Image URL">
    <h3>Upload</h3>
    <input type="file" name="file" id="file">
    <input type="submit" value="Analyze">
</form>

<?php

...

We’re focusing on the PHP side only here, so we’ll leave out the CSS. I apologize to your eyes.

Ugly form

We’ll be using Imgur to host the images, so that we don’t have to host the application in order to make the calls to Diffbot (the images will be public even if our app isn’t, saving us hosting costs). Let’s first register an application on Imgur via this link:

Imgur registration

This will produce a client ID and a secret, though we’ll only be using the client ID (anonymous uploads), so we should add it to our file:

$token = 'my_token';
$imgur_client = 'client';

Analyzing the Images

So, how will the analysis happen, anyway?

As described in the docs, Diffbot’s Image API can accept a URL and then scans the page for images. All found images are additionally analyzed and some data is returned about them.

The data we need are the tags Diffbot attaches to the image entries. tags is an array of JSON objects, each of which contains a tag label, and a link to http://dbpedia.org for the related resource. We won’t be needing these links in this tutorial, but we will be looking into them in a later piece. The tags array takes a form similar to this:

"tags": [
        {
          "id": 4368,
          "label": "Beagle",
          "uri": "http://dbpedia.org/resource/Beagle"
        },
        {
          "id": 2370241,
          "label": "Treeing Walker Coonhound",
          "uri": "http://dbpedia.org/resource/Treeing_Walker_Coonhound"
        }
      ]

As you can see, each tag has the aforementioned values. If there’s only one tag, only one object will be present. By default, Diffbot returns up to 5 tags per entry – so each image can have up to 5 tags, and they don’t have to be directly related (e.g. submitting an image of a running shoe might return both the tag Nike and the tag shoe).

It is these tag labels we’ll be using as suggested guesses of dog breeds. Once the request goes through and returns the tags in the response, we’ll print the suggested labels below the image.

Processing Submissions

To process the form, we’ll add some basic logic below the token declaration. :

if ($_SERVER['REQUEST_METHOD'] == 'POST') {
    if (isset($_FILES['file']['tmp_name']) && !empty($_FILES['file']['tmp_name'])) {
        $filename = $_FILES['file']['tmp_name'];

        $c = new Client();
        $response = $c->request('POST', 'https://api.imgur.com/3/image.json', [
            'headers' => [
                'authorization' => 'Client-ID ' . $imgur_client
            ],
            'form_params' => [
                'image' => base64_encode(fread(fopen($filename, "r"),
                    filesize($filename)))
            ]
        ]);

        $body = json_decode($response->getBody()->getContents(), true);
        $url = $body['data']['link'];
        if (empty($url)) {
            echo "<h2>Upload failed</h2>";
            die($body['data']['error']);
        }
    }

    if (!isset($url) && isset($_POST['url'])) {
        $url = $_POST['url'];
    }

    if (!isset($url) || empty($url)) {
        die("That's not gonna work.");
    }

    $d = new Swader\Diffbot\Diffbot($token);
    /** @var Image $imageDetails */
    $imageDetails = $d->createImageAPI($url)->call();
    $tags = $imageDetails->getTags();

    echo "<img width='500' src='{$url}'>";

    switch (count($tags)) {
        case 0:
            echo "<h4>We couldn't figure out the breed :(</h4>";
            break;
        case 1:
            echo "<h4>The breed is probably " . labelSearchLink($tags[0]['label']) . "</h4>";
            echo iframeSearch($tags[0]['label']);
            break;
        default:
            echo "<h4>The breed could be any of the following:</h4>";
            echo "<ul>";
            foreach ($tags as $tag) {
                echo "<li>" . labelSearchLink($tag['label']) . "</li>";
            }
            echo "</ul>";
            echo iframeSearch($tags[0]['label']);
            break;
    }
}

We first check if a file was selected for upload. If so, it takes precedence over a link-based submission. The image is uploaded to Imgur, and the URL Imgur returns is then passed to Diffbot. If only a URL was provided, it’s used directly.

We used Guzzle as the HTTP client directly because we’ve already installed it so the Diffbot PHP client can use it to make API calls.

After the image data is returned, we grab the tags from the Image object and output them on the screen, along with a link to Bing search results for that breed, and an iframe displaying those results right then and there.

The functions building the search-link and iframe HTML element are below:

function labelSearchLink($label) {
    return '<a href="http://www.bing.com/images/search?q='.urlencode($label).'&qs=AS&pq=treein&sc=8-6&sp=1&cvid=92698E3A769C4AFE8C6CA1B1F80FC66D&FORM=QBLH" target="_blank">'.$label.'</a>';
}

function iframeSearch($label) {
    return '<iframe width="100%" height="400" src="http://www.bing.com/images/search?q='.urlencode($label).'&qs=AS&pq=treein&sc=8-6&sp=1&cvid=92698E3A769C4AFE8C6CA1B1F80FC66D&FORM=QBLH" />';
}

Beagle slightly misidentified

Again, please excuse the design of both the code and the web page – as this is just a quick prototype, CSS and frameworks would have been distracting.

Testing and Comparison

As we can see from the image above, Diffbot has misidentified the hound as well – but not as grossly as Microsoft. In this case, my beagle really does look more like a treeing walker coonhound than a typical beagle.

Let’s see some more examples.

Diffbot fails, MS succeeds

Ah, curses! Microsoft wins this round – Diffbot thought it had a better chance of guessing between a basset hound and a treeing walker coonhound, but missed on both. How about another?

Bingo!

Bingo! Both are spot on, though Diffbot is playing it safe by, again, suggesting the walker as an alternative. Okay, that one was a bit too obvious – how about a hard one?

Derps

Hilariously, this derpy image seems to remind both AIs of a Welsh corgi!

What if there’s more than one dog in the image, though?

Whoops, Diffbot got it very wrong

Adorable, Diffbot, but no cigar – well done Microsoft!

Okay, last one.

Sleeping beagle

Excellent work on both fronts. Obviously, the “dog detecting AI” is maxed out! Granted, Diffbot does have a small advantage in that it is also able to detect faces, text, brands, other animal types and more in images, but their “dog recognition” is toe to toe.

Conclusion

In this tutorial, we saw how easy it is to harness the power of modern AI to identify dog breeds at least somewhat accurately. While both engines have much room to improve, the more content we feed them, the better they’ll become.

This was a demonstration of the ease of use of powerful remote machine learning algorithms, and an introduction into a more complex topic we’ll be exploring soon – stay tuned!