We may be familiar with Siri, Google Now, and Cortana as the mobile operating system’s native voice-operated assistants, but Apple has gone a step further and published an API that allows developers to recognise speech and make use of it. iOS users are already used to Siri to interact with apps and dictate text, and now developers have direct access to that text.
With great power comes great responsibility, as they say on the web. What are some new powers we are getting thanks to this technology, and what are some of the risks?
An evident advantage is that by combining the speech APIs with APIs like NSLinguisticTagger we have the tools we need to make our app understand a user’s intent. Rather than instructing the user to chose from a set of buttons they can tap on, we can let the user express their fine-grained wishes. The difference is similar to that between voice messages and text messages: the latter tends to contain much more information we can use to understand a person and a context.
But, as is also the case with voice messages, we often have noise or too much information, and writing algorithms to filter through it is not trivial. Another caveat is that speech recognition is carried out on Apple’s servers, and therefore, as Apple themselves advise, you should instruct the user not to send any sensitive information (such as health data or passwords) to the cloud.
That said, the risks are manageable. If you want to really listen to your users, put your code where your mouth is.