Web
Article

How to Build Your Own AI Assistant Using Api.ai

By Patrick Catanzariti

Building an AI assistant with Api.ai

The world of artificially intelligent assistants is growing — Siri, Cortana, Alexa, Ok Google, Facebook M — all the big players in technology have their own. However, many developers do not realise that it is quite easy to build your own AI assistant too! You can customise it to your own needs, your own IoT connected devices, your own custom APIs — the sky is the limit.

Late last year, I put together a guide on five simple ways to build artificial intelligence in 2016 where I covered a few of the simple options out there for building an AI assistant. In this article, I’d like to look at one particular service that makes it incredibly simple to get quite a fully featured AI assistant with very little initial set up — Api.ai.

What is Api.ai?

Api.ai is a service that allows developers to build speech-to-text, natural language processing, artificially intelligent systems that you can train up with your own custom functionality. They have a range of existing knowledge bases that systems built with Api.ai can automatically understand called “Domains” — this is what we will be focusing on in this article. Domains provide a whole knowledge base of encyclopaedic knowledge, language translation, weather and more. In future articles, we will cover some of the more advanced aspects of Api.ai that allow us to personalise our assistant further.

Getting Started With Api.ai

To get started, we will head to the Api.ai website and click either the “Get Started for Free” button or the “Sign Up Free” button in the top right hand corner.

We are then taken to a registration form which is pretty straightforward — enter your name, email and password and click “Sign up”. For those avoiding yet another set of login credentials, you can also sign up using your GitHub or Google account using the buttons to the right:

Signing up to Api.ai

Once we have signed up, we will be taken straight to the Api.ai interface where we can create our virtual AI assistant. Each assistant we create and teach specific skills to is called an “agent” in Api.ai. So, to begin, we create our first agent by clicking the “Create Agent” button on the top left hand side:

Creating new agent in Api.ai

On the next screen, we enter in our agent’s details, including:

  • Name: This is just for your own reference to differentiate agents in the api.ai interface. You could call the agent anything you would like, either a person’s name (I chose Barry) or a name that represents the tasks they are helping out with (e.g. light-controller).
  • Description: A human readable description so you can remember what the agent’s responsible for. This is optional and might not be needed if your agent’s name is self-explanatory.
  • Language: The language which the agent works in. This cannot be changed once you’ve chosen it — so choose wisely! For this tutorial, we will be choosing English as English has access to the most Api.ai domains. You can see which domains are available for each language in the Languages table in the Api.ai docs.
  • Compatibility with other services: You can also choose to make your agent compatible with Microsoft’s Cortana and/or with Assistant.ai (the AI assistant app built by the creators of Api.ai). For now, we will leave those unchecked as you can check these later on in the agent’s settings.

When you have input your agent’s settings, choose “Save” next to the agent’s name to save everything:

Setting for your new agent in Api.ai

The Test Console

Once your agent has been created, you should see a test console appear on the right. This is a sign that your agent is ready to go! To showcase the power of domains in Api.ai, ask it to identify a celebrity and hit enter. For example, I asked it “Who is Steve Nash?” to see whether it would recognize one of my favorite NBA basketball stars. If it recognized your query, your results should appear below it:

The Api.ai test console

If you scroll down on the right hand side of results, you will see more details for how Api.ai interpreted your request. Below that, we have a button called “Show JSON”. Click that to see how the API will return this sort of response to us in our app:

Finding the Show JSON option

Api.ai will open up the JSON viewer and show you a JSON response that looks similar to this one:

{
  "id": "389b5147-31c7-41d7-a7a7-5bbb8b324407",
  "timestamp": "2016-01-11T01:13:00.881Z",
  "result": {
    "source": "DuckDuckGo",
    "resolvedQuery": "Who is Steve Nash?",
    "action": "wisdom.person",
    "parameters": {
      "q": "Steve Nash",
      "request_type": "whatis"
    },
    "metadata": {},
    "fulfillment": {
      "speech": "Stephen John Nash, OC, OBC, is a Canadian retired professional basketball player who played in the National Basketball Association. The point guard was an eight-time NBA All-Star and a seven-time All-NBA selection. Twice Nash was named the NBA Most Valuable Player while playing for the Phoenix Suns. He currently serves as the general manager of the Canadian national team and as a player development consultant for the Golden State Warriors."
    }
  },
  "status": {
    "code": 200,
    "errorType": "success"
  }
}

Reading through that response, we can see that Api.ai got this information from DuckDuckGo, understood that I was asking for information on a person named “Steve Nash” and provided the response within result.fulfillment.speech. We have a status code section in the JSON object too which will alert us to any errors that occur during our requests.

Our agent is now be ready for us to integrate them into our own web app interface. To do so, we will need to get our API keys to give us remote access to our agent.

Finding your Api.ai API keys

The API keys we will need are further down on this agent page. Scroll down and you will find the “API keys” section. Copy and paste the “Subscription key” and “Client access token” somewhere safe. Those are what we will need to make queries to the Api.ai SDK:

Finding your Api.ai API keys

The Code

If you would like to take a look at the working code and play around with it, it is available on GitHub. Feel free to use it and expand on the idea for your own AI personal assistant.

If you would like to try it out, I have Barry running right here. Enjoy!

Connecting to Api.ai Using JavaScript

We have a working personal assistant that is running in Api.ai’s cloud somewhere. We now need a way to speak to our personal assistant from our own interface. Api.ai has a range of platform SDKs that work with Android, iOS, web apps, Unity, Cordova, C++ and more. For this example, we will be using HTML and JavaScript to make a simple personal assistant web app. My demo builds off the concepts Api.ai show in their HTML + JS gist.

Our app will do the following:

  • Accept a written command in an input field, submitting that command when we hit the Enter key.
  • OR — Using the HTML5 Speech Recognition API (this only works on Google Chrome 25 and above), if the user clicks “Speak”, they can speak their commands and have them written into the input field automatically.
  • Once the command has been received, we will be using jQuery to submit an AJAX POST request to Api.ai. Api.ai will return its knowledge as a JSON object as we saw above in the test console.
  • We will read in that JSON file using JavaScript and display the results on our web app.
  • If available, our web app will also use the Web Speech API (available in Google Chrome 33 and above) to respond back to us verbally.

The whole web app is available on Github at the link above, feel free to refer to that to see how we have styled things and structured the HTML. We won’t be explaining every piece of how it is put together in this article, we will focus on the Api.ai SDK side of things. I will also point out and explain briefly which bits are using the HTML5 Speech Recognition API and Web Speech API.

Our JavaScript contains the following variables:

var accessToken = "YOURACCESSTOKEN",
    subscriptionKey = "YOURSUBSCRIPTIONKEY",
    baseUrl = "https://api.api.ai/v1/",
    $speechInput,
    $recBtn,
    recognition,
    messageRecording = "Recording...",
    messageCouldntHear = "I couldn't hear you, could you say that again?",
    messageInternalError = "Oh no, there has been an internal server error",
    messageSorry = "I'm sorry, I don't have the answer to that yet.";

Here is what each of these is for:

  • accessToken and subscriptionKey – These are the two API keys which you copied over from the Api.ai interface. These give us permission to access the SDK and also say which agent it is that we are accessing. I want to access Barry, my personal agent.
  • baseUrl – This is the base URL for all calls to the Api.ai SDK. If a new version of the SDK comes out, we can update it here.
  • $speechInput – This stores our <input> element so we can access it in our JavaScript.
  • $recBtn – This stores our <button> element that we will be using for when the user wants to click and speak to the web app instead.
  • recognition – We store our webkitSpeechRecognition() functionality in this variable. This is for the HTML5 Speech Recognition API.
  • messageRecording, messageCouldntHear, messageInternalError and messageSorry – These are messages to show when the app is recording the user’s voice, could not hear their voice, when we have an internal error and if our agent does not understand. We store these as variables so that we can change them easily at the top of our script and also so that we can specify which ones we do not want the app to speak out loud later on.

In these lines of code, we look for when the user presses the Enter key in the input field. If so, we run the send() function to send off the data to Api.ai:

$speechInput.keypress(function(event) {
  if (event.which == 13) {
    event.preventDefault();
    send();
  }
});

Next, we watch for if the user clicks the recording button to ask the app to listen to them (or if it is listening, to pause listening). If they click it, we run the switchRecognition() function to switch from recording to not recording and vice versa:

$recBtn.on("click", function(event) {
  switchRecognition();
});

Finally, for our initial jQuery setup, we set up a button which will be on the bottom right of our screen to show and hide the JSON response. This is just to keep things clean, most of the time we won’t want to see the JSON data that comes through, but every now and then if something unexpected happens, we can click this button to toggle whether the JSON is viewable or not.

$(".debug__btn").on("click", function() {
  $(this).next().toggleClass("is-active");
  return false;
});

Using the HTML5 Speech Recognition API

As mentioned above, we will be using the HTML5 Speech Recognition API to listen to the user and transcribe what they say. This only works in Google Chrome at the moment.

Our startRecognition() function looks like so:

function startRecognition() {
  recognition = new webkitSpeechRecognition();

  recognition.onstart = function(event) {
    respond(messageRecording);
    updateRec();
  };
  recognition.onresult = function(event) {
    recognition.onend = null;
    
    var text = "";
    for (var i = event.resultIndex; i < event.results.length; ++i) {
      text += event.results[i][0].transcript;
    }
    setInput(text);
    stopRecognition();
  };
  recognition.onend = function() {
    respond(messageCouldntHear);
    stopRecognition();
  };
  recognition.lang = "en-US";
  recognition.start();
}
[/code]

This is what runs the HTML5 Speech Recognition API. It all uses functions within webkitSpeechRecognition(). Here are a few pointers for what is going on:

  • recognition.onstart - Runs when recording from the user's microphone begins. We use a function called respond() to display our message that tells the user we are listening to them. We will cover the respond() function in more detail soon. updateRec() switches the text for our recording button from "Stop" to "Speak".
  • recognition.onresult - Runs when we have a result from the voice recognition. We parse the result and set our text field to use that result via setInput() (this function just adds the text to the input field and then runs our send() function).
  • recognition.onend - Runs when the voice recognition ends. We set this to null in recognition.onresult to prevent it running if we have a successful result — this way, if recognition.onend runs, we know the voice recognition API has not understood the user. If the function does run, we respond to the user to tell them we did not hear them correctly.
  • recognition.lang - Sets the language we are looking for. In our demo's case, we are looking for US English.
  • recognition.start() - Starts that whole process!

Our stopRecognition() function is much simpler. It stops our recognition and sets it to null. Then, it updates the button to show that we are not recording anymore.

[code language="js"] function stopRecognition() { if (recognition) { recognition.stop(); recognition = null; } updateRec(); }

switchRecognition() toggles whether we are starting or stopping recognition by checking the recognition variable. This lets our button toggle the recognition on and off:

function switchRecognition() {
  if (recognition) {
    stopRecognition();
  } else {
    startRecognition();
  }
}

Communicating With Api.ai

To send off our query to Api.ai, we use the send() function which looks like so:

function send() {
  var text = $speechInput.val();
  $.ajax({
    type: "POST",
    url: baseUrl + "query/",
    contentType: "application/json; charset=utf-8",
    dataType: "json",
    headers: {
      "Authorization": "Bearer " + accessToken,
      "ocp-apim-subscription-key": subscriptionKey
    },
    data: JSON.stringify({q: text, lang: "en"}),

    success: function(data) {
      prepareResponse(data);
    },
    error: function() {
      respond(messageInternalError);
    }
  });
}

This is a typical AJAX POST request using jQuery to https://api.api.ai/v1/query. We make sure we are sending JSON data to it and are expecting JSON data from it. We also need to set two headers — Authorization and ocp-apim-subscription-key to be our API keys for Api.ai. We send our data in the format {q: text, lang: "en"} to Api.ai and wait for a response.

When we receive a response, we run prepareResponse(). In this function, we format the JSON string we will put into our debug section of the web app and we take out the result.speech part of Api.ai’s response which provides us with our assistant’s text response. We display each message via respond() and debugRespond():

function prepareResponse(val) {
  var debugJSON = JSON.stringify(val, undefined, 2),
      spokenResponse = val.result.speech;

  respond(spokenResponse);
  debugRespond(debugJSON);
}

Our debugRespond() function puts text into our field for a JSON response:

function debugRespond(val) {
  $("#response").text(val);
}

Our respond() function has a bit more steps to it:

function respond(val) {
  if (val == "") {
    val = messageSorry;
  }

  if (val !== messageRecording) {
    var msg = new SpeechSynthesisUtterance();
    var voices = window.speechSynthesis.getVoices();
    msg.voiceURI = "native";
    msg.text = val;
    msg.lang = "en-US";
      window.speechSynthesis.speak(msg);
  }

  $("#spokenResponse").addClass("is-active").find(".spoken-response__text").html(val);
}

At the beginning, we check to see if the response value is empty. If so, we set it to say that it isn’t sure of the answer to that question as Api.ai has not returned a valid response to us:

if (val == "") {
  val = messageSorry;
}

If we do have a message to output and it isn’t the one saying that we are recording, then we use the Web Speech API to say the message out loud using the SpeechSynthesisUtterance object. I found that without setting voiceURI and lang, my browser’s default voice was German! This made its speech rather tough to understand until I changed it. To actually speak the message, we use the window.speechSynthesis.speak(msg) function:

if (val !== messageRecording) {
  var msg = new SpeechSynthesisUtterance();
  msg.voiceURI = "native";
  msg.text = val;
  msg.lang = "en-US";
  window.speechSynthesis.speak(msg);
}

Note: It is important not to have it speak the “Recording…” bit of text — if we do, then the microphone will pick up that speech and add it into the recorded query.

Finally, we display our response box and add that text to it so that the user can read it too:

$("#spokenResponse").addClass("is-active").find(".spoken-response__text").html(val);

In Action

If we run the web app using my styles within the GitHub repo, it looks something like this:

Barry, my AI assistant in action

If we ask it a question by clicking “Speak” and saying “Who is Jim Davis?”, it initially shows that we are recording (you may need to give Chrome permission to access your microphone when you click that button — apparently this will happen every time unless you serve the page as HTTPS):

Our app whilst it is recording

It then responds visually with the response like so (and speaks it too, however that is difficult to show in a screenshot):

Our first response from Api.ai about Jim Davis

We can also click the button in the bottom right to see the JSON response Api.ai gave us, just in case we would like to debug the result:

Viewing our JSON result to debug

We can ask any number of things to our new personal assistant. Here are a few other things I’ve asked Barry:

Asking our AI a math question

Finding out the weather

Asking our AI how are you?

Our AI performs some translation

Having Issues?

I found that occasionally, if the Web Speech API tried to say something too long, Chrome’s speech stops working. If this is the case for you, close the tab and open a new one to try again.

Conclusion

As I’m sure you can see, Api.ai is a really simple way to get quite a surprising amount of personal assistant functionality right out of the box.

In an upcoming article here at SitePoint, we will look at how to expand what your agent can understand so that they can answer questions specific to your needs. There is so much more that you can do!

If you build your own personal assistant using Api.ai, I’d love to hear about it! Did you name yours Barry too? What questions have you asked it successfully? Let me know in the comments below, or get in touch with me on Twitter at @thatpatrickguy.

Patrick Catanzariti
Meet the author
PatCat is the founder of Dev Diner, a site helping developers navigate the world of emerging tech. He is a SitePoint editor for the HTML/CSS Channel and a contributing editor for emerging tech such as the Internet of Things, virtual/augmented reality and more. He is an instructor at SitePoint Premium and O'Reilly, a Meta Pioneer and freelance web developer who loves every opportunity to tinker with something new in a tech demo.
  • http://about.me/devdrops Davi Marcondes moreira

    This tutorial is just amazing! I’ve spent hours of fun playing with my new agent, it’s called Tupiniquim and speaks in brazilian portuguese. Thank you Patrick!

    • Patrick Catanzariti

      Brilliance! That’s a pretty fancy sounding AI :) Really glad you enjoyed it, I’ll have another article coming soon helping you expand Tupiniquim even more ;)

  • Patrick Catanzariti

    I think each service will be good for different people. For me, Api.ai and Wit.ai are much easier to get started with, despite depending on one particular service. Sirius requires developers to set it up in Ubuntu and Melissa-AI works with OS X and Linux. That’s a much bigger barrier to entry than a ready made service that you can get started with straight away in the cloud.

    I’ve got plans to explore some of the other options which involve that level of set up in future demos but I’d prefer to cover some of these services first. Api.ai is great for its initial simplicity and huge amounts of functionality out of the box.

  • Pandora’s Paradise

    Yes to Sirius.clarity!

    • http://www.twitter.com/solyarisoftware Giorgio Robino

      May you supply any link/doc to understand the concept/model under the Sirius approach to NLP ? I’m asking because reading some intro doc on Sirius website, that’s NOT clear at all :(

  • Patrick Catanzariti

    I think these are default greetings that Api.ai comes with, that is part of one of the “Domains” that is loaded :) I believe you should be able to overwrite them with your own using the methods I speak about in the follow up article here – http://www.sitepoint.com/customizing-your-api-ai-assistant-with-intent-and-context/

  • Patrick Catanzariti

    Grazie Giorgio! I appreciate you taking the time to write :)

    Feel free to retweet and share the article to anyone who might be interested in getting into the basics of AI — that’s what these articles are for!

    I agree that articles can become a little long with my step-by-step approach, but I’ve chosen that to ensure it’s as clear as possible. I figure, if people understand some of the steps intuitively, they’ll scroll past them faster anyway. If someone is really confused about a step, sometimes having a lot of detail can really help. To counter it being too long, I’ve tried to separate each article into concepts (like how my next article was about customization). Hoping that helps prevent them getting *too* long!

    I might actually write an opinion piece on the different types of approaches to starting in AI on Dev Diner at some point, as it is a topic many have brought up. Some commenters don’t feel that Api.ai is truly AI at all, whilst others (like me) disagree and see it as a nice simple stepping stone to the more complicated stuff.

    Ah… the more complicated stuff. I’m actually really looking forward to exploring that in months to come here at SitePoint ;)

  • Jia Shern Tan

    Hey! Great tutorial. I’m trying to make the app work just by text without the voice but whenever I remove the speech synthesis part, it doesn’t give a response.

    • Patrick Catanzariti

      That’s quite strange! It should still work with text entry even if you keep the speech synthesis part in there. I’m not sure why it would stop working when removing the speech synthesis — unless you’re unintentionally removing something else that’s needed for it to run? Have you had any luck with it?

  • prodigyrick

    Can you please direct me to projects with arduino or raspberry pi that I can create to use my project that I built here so that it may function as a home automation device?

  • bcv

    I understand how the API delivers most results, but what about something like “seafood restaurants in San Francisco” – how would it decide which options to show first?

    • Patrick Catanzariti

      I wouldn’t be able to tell you how it runs in the background, if it’s custom functionality you add to your own app, then it works however you set the server up.

  • Patrick Catanzariti

    I’m afraid I haven’t come across any good examples for doing so, but it’d be possible with things like api.ai — you’d just need to do the database lookups on your end (with api.ai parsing the query).

  • Patrick Catanzariti

    I’m actually not sure if there is an open source project like this, did you manage to find anything?

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in Front-end, once a week, for free.