How to Make Your Web App Smarter with Image Recognition

Clarifai is an API which provides image and video recognition that is incredibly simple to use and a whole lot of fun to implement. In this article, we will explore dragging and dropping images from around the web into a simple web app that will read them and tell us what it believes they are.

In this demo, we will be using Node.js for the server and a relatively basic front end that uses jQuery for AJAX requests. If you aren’t strong in Node.js, that should be okay as long as you are at a level where you are comfortable running npm install to pull in modules and node app.js in the command line to get your web app going. You won’t need to customize too much within it and might learn a thing or two in the end by getting the existing code running!

The Code

All of the sample code for this demo is available on GitHub.

Getting Started

To get started, we go to the Clarifai home page and click the “Sign up now” button on the top right:

We want to create a new application, so we head to the application screen by clicking the “Applications” menu item on the left.

Clarifai won’t allow us to create an application just yet, as we need to choose a plan:

Lets choose a plan so we can get things going. For our demo, the free plan should be more than suitable. We can upgrade later if needed:

We are now allowed to create an application, to do so we can either click the “Applications” menu item on the left or the “create an Application” link:

Navigating back to create an application

Click the “Create a New Application” button:

We give our new application a name (e.g. “Image Recognizer”), leave the default model as is and set our language (we have kept it on English, you may prefer a different language!). To finish, click “Create Application”:

Our new application details should now appear. The two most important bits we will want to copy somewhere safe are our “Client ID” and “Client Secret” — we will need these to access Clarifai on our server that we will set up next.

Setting Up Our Node.js Server

Clarifai has a Node.js client we can use to interface with its service available on GitHub. Download the repo to your computer. In particular, we want the clarifai_node.js file.

Create a directory for your Node server and add the `clarifai_node.js` JavaScript file into the root directory.

Our Node.js server functions will be within a JavaScript file called app.js. This is where we will manage our Clarifai powered image recognition requests. app.js has the following JavaScript:

var Clarifai = require("./clarifai_node.js"),
    express = require("express"),
    app = express(),
    server = require("http").Server(app),
    bodyParser = require("body-parser"),
    port = process.env.PORT || 5000;

app.use(bodyParser.json());

Clarifai.initAPI("YOUR_CLIENT_ID", "YOUR_CLIENT_SECRET");

function identifyClarifaiError(err) {
  // Default error function from Clarifai we won't go into but you can find it in the GitHub download of this code!
}

app.post("/examineImage", function(req, resp) {
  var imageURL = req.body.imageRequested;
  console.log("Response was ", imageURL);

  Clarifai.tagURL(imageURL, "Image from browser", commonResultHandler);

  function commonResultHandler(err, res) {
    if (err != null) {
      identifyClarifaiError(err);
    }
    else {
      if (typeof res["status_code"] === "string" && 
        (res["status_code"] === "OK" || res["status_code"] === "PARTIAL_ERROR")) {

        if (res["results"][0]["status_code"] === "OK") {
          var tags = res["results"][0].result["tag"]["classes"];
          console.log("Tags found were: ", tags);
          resp.send(tags);
        }
        else {
          console.log("We had an error... Details: " +
            " docid=" + res.results[0].docid +
            " local_id=" + res.results[0].local_id + 
            " status_code="+res.results[0].status_code +
            " error = " + res.results[0]["result"]["error"]);

          resp.send("Error: " + res.results[0]["result"]["error"]);
        }
      }    
    }
  }
});

app.get("/", function(request, response) {
  response.sendFile(__dirname + "/public/index.html");
});

app.get(/^(.+)$/, function(req, res) {
  res.sendFile(__dirname + "/public/" + req.params[0]);
});

app.use(function(err, req, res, next) {
  console.error(err.stack);
  res.status(500).send("Something broke!");
});

server.listen(port, function() {
  console.log("Listening on " + port);
});

A large proportion of the code is basic Node express server functionality which we won’t cover in this article, if you aren’t quite sure these parts mean, you can leave them as is and just enjoy a running Node server.

The bits which relate specifically to Clarifai begin with our line of code that includes our clarifai_node.js file:

var Clarifai = require("./clarifai_node.js"),

The next line which uses Clarifai starts out initialization of the API. It gives us access to the API using the client ID and client secret which we copied somewhere safe earlier. Paste them into the appropriate spots:

Clarifai.initAPI("YOUR_CLIENT_ID", "YOUR_CLIENT_SECRET");

We then have a POST request which the Node server will look out for and respond to. This request expects to receive a web URL for an image within our POST body called imageRequested when accessed via /examineImage. It logs whatever URL it finds into our console:

app.post("/examineImage", function(req, resp) {
  var imageURL = req.body.imageRequested;
  console.log("Response was ", imageURL);

We then run a function from the Clarifai Node API Client called tagURL(). This function takes three parameters — the image URL we want Clarifai to examine, a name we give the image (you could potentially change this name and adapt it from the URL if you wanted but to keep it simple we’ve kept it as a generic name for all) and the callback function once it has run:

Clarifai.tagURL(imageURL, "Image from browser", commonResultHandler);

Within commonResultHandler(), we react to what Clarifai returns to us. If it returns an error, we pass it to the identifyClarifaiError() function which we can leave as is (you can find that function in the GitHub download above). It contains a series of checks for status codes which come from Clarifai. For our purposes in this basic demo, we won’t cover all of what it does as you shouldn’t need to adjust it.

function commonResultHandler(err, res) {
  if (err != null) {
    identifyClarifaiError(err);
  }
  // Continues further

If we do not have a clear error returned, we double check that Clarifai’s returned data does not also contain error statuses within its res["status_code"]:

else {
  if (typeof res["status_code"] === "string" && 
    (res["status_code"] === "OK" || res["status_code"] === "PARTIAL_ERROR")) {

Clarifai returns an array of results within res["results"] — one for each image it is given. As we are only providing one single image, we only need to retrieve the first item in that array. Each item will have a JSON object of data it has for that image. The JSON returned looks like so:

{
  "docid": 6770681588539017000,
  "url": "https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQSoU65AMIOpJ2rwtvdJyuSExIjcwQfuIup8sm6tesdWwtCEajzVw",
  "status_code": "OK",
  "status_msg": "OK",
  "local_id": "Image from browser",
  "result": {
    "tag": {
      "concept_ids": [
        "ai_l8TKp2h5",
        "ai_VPmHr5bm"
      ],
      "classes": [
        "people",
        "adult"
      ],
      "probs": [
        0.9833399057388306,
        0.9695020318031311
      ]
    }
  },
  "docid_str": "c009c46cf0c7b68b5df64b083c2547b4"
}

The most important bits for us to use are within the result object. This contains three arrays, one which lists the Clarifai concept IDs for the elements it has found, one lists the “classes” for them (the human readable names for each concept) and one lists the probability for each being correct. The order of these match up with each object’s tag, so in the example above, the concept ID of "ai_l8TKp2h5" is known as "people" and Clarifai is about 0.9833399057388306 percent sure that there are people in this image.

Using this data, we can list these classes to show what Clarifai has uncovered. In the code below, we check the status code in this result is "OK" and then send the array of tags as a response to the front end’s AJAX request.

if (res["results"][0]["status_code"] === "OK") {
  var tags = res["results"][0].result["tag"]["classes"];
  console.log("Tags found were: ", tags);
  resp.send(tags);
}

Otherwise, if the status code isn’t "OK", we log the details of the error and send that back to our web app instead:

else {
  console.log("We had an error... Details: " +
    " docid=" + res.results[0].docid +
    " local_id=" + res.results[0].local_id + 
    " status_code="+res.results[0].status_code +
    " error = " + res.results[0]["result"]["error"]);

  resp.send("Error: " + res.results[0]["result"]["error"]);
}

Our Front End JavaScript

Much of the front end can be made however you’d like. In our example, the front end is going to be a relatively simple one which allows for an image to be dragged onto the app from elsewhere on the web. We read it the URL, send it to our Node server above and then await a list of tags to show.

Our full front end JavaScript file looks like so:

var baseUrl = window.location.origin,
    dropArea = document.getElementById("dropArea");

dropArea.addEventListener("drop", imageDropped, false);

function imageDropped(evt) {
  evt.stopPropagation();
  evt.preventDefault(); 

  var imageHTML = evt.dataTransfer.getData("text/html"),
      dataParent = $("<div>").append(imageHTML),
      imageRequested = $(dataParent).find("img").attr("src"),
      $imageFound = $("#imageFound");
  
  console.log(imageRequested);

  $imageFound.attr("src", imageRequested);

  $.ajax({
    type: "POST",
    url: baseUrl + "/examineImage",
    contentType: "application/json; charset=utf-8",
    dataType: "json",
    data: JSON.stringify({"imageRequested": imageRequested}),

    success: function(data) {
      console.log(data);
      var tags = "";
      for (var i = 0; i < data.length; i++) {
        tags += data[i];
        if (i != data.length - 1) tags += ", ";
      }
      $(dropArea).html(tags);
    },
    error: function() {
      console.log("We had an error!");
    }
  });
}

The initial line of code reads in the URL we’ve got in the browser bar, as this is also the URL for our server:

var baseUrl = window.location.origin,

We then tell JavaScript to keep an eye on the #dropArea element and add an event listener that will run imageDropped() if we drop something onto it:

dropArea = document.getElementById("dropArea");

dropArea.addEventListener("drop", imageDropped, false);

imageDropped() starts by preventing the usual behavior that will happen when a file is dragged into the browser (it usually will load that file into the browser window you dragged it into):

function imageDropped(evt) {
  evt.stopPropagation();
  evt.preventDefault();

Once we are sure that the usual functionality of dragging and dropping by the browser has been avoided, we get the HTML from the event’s dropped data. The data should typically include an <img> tag, but sometimes has other tags that come along with it like a <meta> tag and other <div> tags. To ensure we’ve always got a parent element to look inside, we append whatever data we’ve got into a <div>. Then we find the <img> within it, read its src attribute and put this value into a variable called imageRequested:

var imageHTML = evt.dataTransfer.getData("text/html"),
    dataParent = $("<div>").append(imageHTML),
    imageRequested = $(dataParent).find("img").attr("src")

There is an <img> tag with an ID of #imageFound in our HTML which we then place the dragged image into, so that we can visually see the image underneath our results. We also log the URL of the image for debugging (you can remove the console.log if you’d prefer):

$imageFound = $("#imageFound");

console.log(imageRequested);

$imageFound.attr("src", imageRequested);

With our newly acquired image URL we have retrieved and stored into imageRequested, we send it to our Node server’s /examineImage address within a JSON object in the format {"imageRequested": "http://www.somewebsite.com/yourimage.jpg"}. On successful retrieval of tags (Clarifai calls them classes), we change them into a comma separated string and place that string into our HTML’s #dropArea element. If there is an error, we log that an error has occurred.

$.ajax({
  type: "POST",
  url: baseUrl + "/examineImage",
  contentType: "application/json; charset=utf-8",
  dataType: "json",
  data: JSON.stringify({"imageRequested": imageRequested}),

  success: function(data) {
    console.log(data);
    var tags = "";
    for (var i = 0; i < data.length; i++) {
      tags += data[i];
      if (i != data.length - 1) tags += ", ";
    }
    $(dropArea).html(tags);
  },
  error: function() {
    console.log("We had an error!");
  }
});

I won’t cover the HTML in detail as it isn’t too exciting and could definitely be optimized! It looks like so:

<!doctype html>
<html>
<head>
  <title>Image recognition tester</title>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <script src="//code.jquery.com/jquery-1.12.0.min.js"></script>
  <link href="https://fonts.googleapis.com/css?family=Lora" rel="stylesheet" type="text/css"/>
  <style type="text/css">
    #dropArea {
      border: 1px solid #fff;
      bottom: 10%;
      color: #fff;
      display: flex;
      justify-content: center;
      flex-direction: column;
      font-family: "Lora", Arial, sans-serif;
      font-size: 30px;
      left: 10%;
      position: absolute;
      right: 10%;
      text-align: center;
      text-shadow: 0 0 10px rgba(0,0,0,0.5);
      top: 10%;
    }
    #imageFound {
      background-size: 100% cover;
      background: none 0 0 no-repeat #000;
      height: 100%;
      left: 0;
      position: absolute;
      top: 0;
      width: 100%;
    }
  </style>
</head>
<body>
  <img src="" id="imageFound" />
  <div id="dropArea" ondragover="return false;">Drop your image from the web into here!</div>
  <script src="./main.js"></script>
</body>
</html>

In Action

If we run our Node server locally, we can access it via localhost:5000, so run the server using node app.js and visit the page in your web browser.

Visit another website in a separate window and drag in an image from that window to this one:

When it has recognized and identified the image, it tells us a list of tags in order from most likely to least likely which it believes the image contains:

Conclusion

Clarifai has a lot of potential with its image recognition capabilities. This service’s API could be added into a range of AI applications to give our AI a nice bit of visual understanding of the world around it. For example, we could add this functionality to a Siri-style personal assistant like the one we built in the articles on How to Build Your Own AI Assistant Using Api.ai and Customizing Your Api.ai Assistant with Intent and Context. You could add it to a Nodebot or any other web enabled application. Clarifai’s service can also do video recognition which brings a whole new level of potential!

Where do you plan on using Clarifai’s image recognition? I’d love to hear about where this API gets used! Let me know in the comments below, or get in touch with me on Twitter at @thatpatrickguy.

Frequently Asked Questions (FAQs) on Web App Image Recognition

How does image recognition work in web applications?

Image recognition in web applications works by using algorithms and neural networks to interpret and understand digital images. The process begins with the input of an image into the system. The image is then broken down into a series of pixel data, which is analyzed by the algorithm. The algorithm identifies patterns, shapes, and colors in the pixel data and uses this information to recognize and categorize the image. This technology can be used in a variety of applications, from identifying objects in photos to recognizing faces in a crowd.

What are the benefits of using image recognition in web applications?

Image recognition can greatly enhance the functionality and user experience of a web application. It can be used to automate tasks that would otherwise require manual input, such as sorting through large amounts of image data. It can also be used to provide more personalized experiences for users, such as recommending products based on images they’ve uploaded or interacted with. Additionally, image recognition can improve accessibility for users with visual impairments by providing descriptive text for images.

What are some common uses of image recognition in web applications?

Image recognition is used in a wide range of web applications. Social media platforms use it to identify and tag individuals in photos. E-commerce sites use it to recommend products based on images users have interacted with. Image recognition is also used in security applications, such as facial recognition for user authentication. Additionally, it’s used in accessibility tools to provide descriptive text for images.

How can I implement image recognition in my web application?

Implementing image recognition in a web application typically involves using an API (Application Programming Interface) from a service that specializes in image recognition. These services provide the complex algorithms and neural networks needed for image recognition, and they expose these capabilities through an API that your web application can interact with. You’ll need to send image data to the API, and it will return the results of the image recognition process.

What are some challenges in implementing image recognition in web applications?

Implementing image recognition in a web application can present several challenges. One of the main challenges is ensuring the accuracy of the image recognition process. This requires a large amount of high-quality training data and a well-tuned algorithm. Another challenge is handling the large amounts of data involved in image recognition. This can require significant computational resources and efficient data management strategies. Additionally, privacy and security concerns can arise when handling sensitive image data.

How can I improve the accuracy of image recognition in my web application?

Improving the accuracy of image recognition in a web application can involve several strategies. One strategy is to use high-quality training data. The more varied and representative the training data is, the better the algorithm will be at recognizing images. Another strategy is to fine-tune the algorithm. This involves adjusting the parameters of the algorithm to optimize its performance. Additionally, using a service that specializes in image recognition can help ensure accuracy, as these services often have access to large amounts of training data and advanced algorithms.

What are some popular image recognition APIs I can use in my web application?

There are several popular image recognition APIs available for use in web applications. These include the Google Cloud Vision API, the Microsoft Azure Computer Vision API, and the IBM Watson Visual Recognition API. These APIs provide a range of image recognition capabilities, including object detection, facial recognition, and text extraction.

How can I ensure the privacy and security of image data in my web application?

Ensuring the privacy and security of image data in a web application involves several steps. First, it’s important to encrypt image data both in transit and at rest. This can prevent unauthorized access to the data. Second, it’s important to have clear privacy policies in place that outline how image data is used and stored. Finally, using a reputable image recognition service can help ensure privacy and security, as these services often have robust security measures in place.

Can image recognition be used in mobile web applications?

Yes, image recognition can be used in mobile web applications. Many image recognition APIs are designed to work with both desktop and mobile applications. This allows for a range of uses, from scanning barcodes with a mobile camera to identifying landmarks in photos taken on a mobile device.

How does image recognition contribute to the future of web applications?

Image recognition is a key technology that’s driving the future of web applications. As image recognition algorithms become more advanced, they’re enabling new types of applications and features. For example, augmented reality (AR) relies heavily on image recognition to overlay digital information onto the real world. Additionally, as more devices become equipped with cameras, the potential uses for image recognition in web applications will continue to grow.