Translation and Text-to-Speech with Microsoft Translator

Text to speech is a popular technique used by many websites to provide their content in an interactive way. The generation of artificial human voice is known as Speech Synthesis. Even though it’s highly popular, there are very few speech synthesis services, especially when looking for those free of charge. Microsoft Translator is one of the services we can use to get a speech service with limited features. In this tutorial, we are going to look at how we can use Microsoft Translator API to translate content and then make audio files using said content.

You can download the entire source code of this project on Github.

Key Takeaways

Microsoft Translator API can be used to translate text and generate speech files, offering a practical solution for creating interactive content on websites.
The API requires setting up an application in Windows Azure Marketplace, obtaining client credentials, and subscribing to the translation service, which includes a free tier for initial testing.
Implementation involves generating access tokens, making authenticated requests using CURL, and handling translations and speech synthesis through specific API methods.
Despite its utility, Microsoft Translator’s text-to-speech service has limitations, such as a character limit for free usage, making it less suitable for large-scale applications compared to other services like Acapela.

Creating Windows Azure Marketplace Application

First, we have to create an application in Windows Azure Data Marketplace to subscribe to Microsoft Translator API. Let’s get started on the application creation process.

Prerequisites – Microsoft Email Account

Step 1 – Sign into your Azure Account

Use the email account and sign into Azure Marketplace.

Step 2 – Registering the application

Now we have to create an application to use the translation and speech service. This is similar to the applications we create for popular social networking sites such as Facebook, LinkedIn or Twitter. Click the Register Application link to create a new application. You will get a screen similar to the following.

Fill all the details of the given application form. You can define your own Client ID to be used for the application. Client Secret will be automatically generated for you. Keep it unchanged and fill the remaining details as necessary.

Click on the Create button and you will get a list of created applications as shown in the following screen.

Note down the Client ID and Client Secret to be used for the application.

Step 3 – Subscribe to Translation service

Next, we have to subscribe to Microsoft Translator to use the API. Navigate to Microft Translator and subscribe to one of the packages. First 2,000,000 characters are free and it’s obvious that we have to use it for testing purposes.

The following screenshot previews the subscription screen. Subscribe to the free package.

Now we have completed the prerequisites for using the Microsoft Translator API. Let’s get started on the development of the text to speech service.

Initializing Translation API Settings

Let’s get started by initializing the necessary settings for using the Translator API. Create a file called translation_api_initializer.php with the following code.

    <?php

        class TranslationApiInitializer {

            private $clientID;
            private $clientSecret;
            private $authUrl;
            private $grantType;
            private $scopeUrl;

            public function __construct() {
                $this->clientID = "Client ID";

                $this->clientSecret = "Client Secret";
                //OAuth Url.
                $this->authUrl = "https://datamarket.accesscontrol.windows.net/v2/OAuth2-13/";
                //Application Scope Url
                $this->scopeUrl = "http://api.microsofttranslator.com";
                //Application grant type
                $this->grantType = "client_credentials";
            }
        }    
    ?>

Here, we have configured the necessary settings for using the translation service. You have to use the Client ID and Client Secret generated in the application registration process. Other parameters contain the authentication URL and grant type. We can keep it as is for every application, unless it’s changed officially by Microsoft.

Generating Application Tokens

The next task is to generate tokens to access the Translator service. Tokens have a limited life time and we have to generate them regularly. Let’s take a look at the implementation of the token generation function.

    /*
     * Get the access token.
     *
     * @param string $grantType    Grant type.
     * @param string $scopeUrl     Application Scope URL.
     * @param string $clientID     Application client ID.
     * @param string $clientSecret Application client ID.
     * @param string $authUrl      Oauth Url.
     *
     * @return string.
     */

    function getTokens($grantType, $scopeUrl, $clientID, $clientSecret, $authUrl) {
        try {
            //Initialize the Curl Session.
            $ch = curl_init();
            //Create the request Array.
            $paramArr = array(
                'grant_type' => $grantType,
                'scope' => $scopeUrl,
                'client_id' => $clientID,
                'client_secret' => $clientSecret
            );
            //Create an Http Query.//
            $paramArr = http_build_query($paramArr);
            //Set the Curl URL.
            curl_setopt($ch, CURLOPT_URL, $authUrl);
            //Set HTTP POST Request.
            curl_setopt($ch, CURLOPT_POST, TRUE);
            //Set data to POST in HTTP "POST" Operation.
            curl_setopt($ch, CURLOPT_POSTFIELDS, $paramArr);
            //CURLOPT_RETURNTRANSFER- TRUE to return the transfer as a string of the return value of curl_exec().
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
            //CURLOPT_SSL_VERIFYPEER- Set FALSE to stop cURL from verifying the peer's certificate.
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
            //Execute the  cURL session.
            $strResponse = curl_exec($ch);
            //Get the Error Code returned by Curl.
            $curlErrno = curl_errno($ch);
            if ($curlErrno) {
                $curlError = curl_error($ch);
                throw new Exception($curlError);
            }
            //Close the Curl Session.
            curl_close($ch);
            //Decode the returned JSON string.
            $objResponse = json_decode($strResponse);

            if ($objResponse->error) {
                throw new Exception($objResponse->error_description);
            }
            return $objResponse->access_token;
        } catch (Exception $e) {
            echo "Exception-" . $e->getMessage();
        }
    }

Here, we are using a function called getTokens, which accepts all the settings as parameters. Inside the function, we make a curl request to the defined authentication URL by passing the remaining parameters. On successful execution, we can access the token using
$objResponse->access_token.

Implementing Reusable Curl Request

Once an access token is retrieved, we can access the translation functions by authorizing the request with the access token. Generally, we use curl for making requests to APIs, so let’s implement a reusable function for our curl request as shown in the following code.

    /*
     * Create and execute the HTTP CURL request.
     *
     * @param string $url HTTP Url.
     * @param string $authHeader Authorization Header string.
     * @param string $postData   Data to post.
     *
     * @return string.
     *
     */

    function curlRequest($url, $authHeader, $postData=''){
        //Initialize the Curl Session.
        $ch = curl_init();
        //Set the Curl url.
        curl_setopt($ch, CURLOPT_URL, $url);
        //Set the HTTP HEADER Fields.
        curl_setopt($ch, CURLOPT_HTTPHEADER, array($authHeader, "Content-Type: text/xml"));
        //CURLOPT_RETURNTRANSFER- TRUE to return the transfer as a string of the return value of curl_exec().
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
        //CURLOPT_SSL_VERIFYPEER- Set FALSE to stop cURL from verifying the peer's certificate.
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, False);
        if ($postData) {
            //Set HTTP POST Request.
            curl_setopt($ch, CURLOPT_POST, TRUE);
            //Set data to POST in HTTP "POST" Operation.
            curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
        }
        //Execute the  cURL session.
        $curlResponse = curl_exec($ch);
        //Get the Error Code returned by Curl.
        $curlErrno = curl_errno($ch);
        if ($curlErrno) {
            $curlError = curl_error($ch);
            throw new Exception($curlError);
        }
        //Close a cURL session.
        curl_close($ch);
        return $curlResponse;
    }

This function takes a request URL, request header and post data as the parameters and returns the curl response or error. Now all the necessary functions for accessing the translator service are ready.

Translating Content

Microsoft Translator API provides a wide range of methods for translation related functionality. In this tutorial, we will be using Translate and Speak methods. You can look at the complete set of API methods at http://msdn.microsoft.com/en-us/library/ff512419.aspx.
Let’s get started with the implementation of the translate function.

    /*
     * Get the translated text
     * 
     * @param   string $text Text content for translation
     * @param   string $from_lang   Language of the text
     * @param   string $to_lang  Translation language
     * @return  array  Result set
     */

    public function textTranslate($text, $from_lang, $to_lang) {


        try {
            //Get the Access token.
            $accessToken = $this->getTokens($this->grantType, $this->scopeUrl, $this->clientID, $this->clientSecret, $this->authUrl);

            //Create the authorization Header string.
            $authHeader = "Authorization: Bearer " . $accessToken;

            //Input String.
            $inputStr = urlencode($text);

            $from = $from_lang;
            $to = $to_lang;

            //HTTP Detect Method URL.
            $detectMethodUrl = "http://api.microsofttranslator.com/V2/Http.svc/Translate?text=" . urlencode($inputStr) . 
            "&from=" . $from . "&to=" . $to."&contentType=text/plain";

            //Call the curlRequest.
            $strResponse = $this->curlRequest($detectMethodUrl, $authHeader);

            //Interprets a string of XML into an object.
            $xmlObj = simplexml_load_string($strResponse);
            foreach ((array) $xmlObj[0] as $val) {
                $translated_str = $val;
            }


            return array("status" => "success", "msg" => $translated_str);

        } catch (Exception $e) {
            return array("status" => "error", "msg" => $e->getMessage());
        }
    }

Translation requires source language, destination language and text to be translated. So we have used them as the parameters of the textTranslate function. Then we use the previously created getToken function to retrieve the token and assign it to the request header using Authorization: Bearer.

Then we set the source and destination languages using $from_lang and $to_lang variables. Also we have to encode the text content using PHP’s urlencode function.

Now it’s time to start the translation process. Translator API provides a method called Translate and we can access it using the URL at http://api.microsofttranslator.com/V2/Http.svc/Translate

This method takes appId, text, to language and content type as mandatory parameters. Since we are using the authorization header, it’s not a must to specify the appId. So we assign all the necessary parameters into the Translate API URL using $detectMethodUrl variable.

Finally, we initialize the curl request by passing translation API URL and the authorization header. On successful execution, we will get the translated data in XML format. So we use simplexml_load_string function to load the XML string and filter the translated text.
Now we can translate text between any of the supported languages.

Generating Speech Files

The final task of this tutorial is to generate speech in an mp3 file by using the translated text. We will be using a similar technique to do so. Create a function called textToSpeech with the following code.

    /**
     * Returns a stream of a wave-file speaking the passed-in text in the desired language.
     * @param string $text text of language to break
     * @param string $to_lang language of the text
     * @return -
     */
    public function textToSpeech($text, $to_lang) {

        try {

            //Get the Access token.
            $accessToken = $this->getTokens($this->grantType, $this->scopeUrl, $this->clientID, $this->clientSecret, $this->authUrl);

            //Create the authorization Header string.
            $authHeader = "Authorization: Bearer " . $accessToken;

            //Set the params.
            $inputStr = urlencode($text);
            $language = $to_lang;

            $params = "text=$inputStr&language=$language&format=audio/mp3";

            //HTTP Speak method URL.
            $url = "http://api.microsofttranslator.com/V2/Http.svc/Speak?$params";
            //Set the Header Content Type.
            header('Content-Type: audio/mp3');
            header('Content-Disposition: attachment; filename=' . uniqid('SPC_') . '.mp3');

            //Call the curlRequest.
            $strResponse = $this->curlRequest($url, $authHeader);
            echo $strResponse;
        } catch (Exception $e) {
            echo "Exception: " . $e->getMessage() . PHP_EOL;
        }
    }

This function is similar in nature to the translate function. Here we use the Speak method URL at
http://api.microsofttranslator.com/V2/Http.svc/Speak instead of the translate URL. We have to use text, language and format as the necessary parameters. Since we are using the translated text for generating speech, language parameter should be equal to the $to_lang used in textTranslate function.

Then we have to use the necessary headers to automatically download the speech file. Here, we have used audio/mp3 as the content type and the uniqid function is used to generate a unique file name. Now we have both the translate and speech functions ready for use.

Implementing Frontend Interface

So far, we implemented translate and speech functions in the backend of the application. Now we have to build a simplified frontend interface to access the speech file generation service. Let’s create a file called index.php with basic HTML form.

    <?php
        $bing_language_codes = array('da' => 'Danish',
        'de' =>'German','en'=> 'English','fi'=> 'Finnish',
        'fr'=>'French','nl'=>'Dutch','ja'=> 'Japanese','pl'=> 'Polish',                    'es'=> 'Spanish','ru'=> 'Russian',);

    ?>
    <form action='' method='POST' >
    <table>
        <tr><td>Select From Language</td>
        <td><select name='from_lang' >
        <?php foreach($bing_language_codes as $code=>$lang){ ?>
                                                    <option value='<?php echo $code; ?>'><?php echo $lang; ?></option>
        <?php } ?>
            </select>
            </td>
        </tr>
        <tr><td>Select To Language</td>
        <td><select name='to_lang' >
        <?php foreach($bing_language_codes as $code=>$lang){ ?>
            <option value='<?php echo $code; ?>'><?php echo $lang; ?></option>
        <?php } ?>
        </select>
        </td>
        </tr>
        <tr><td>Text</td>
        <td><textarea cols='50' name='text' ></textarea>
        </td>
        </tr>
        <tr><td></td>
        <td><input type='submit' value='Submit' />
        </td>
        </tr>
    </table>
    </form>

This form allows users to select the preferred source and destination languages and type the content to be converted into speech. I have used a few of the supported languages in an array called $bing_language_codes. Now we can move to the form submission handling process to generate the speech file as shown below.

    <?php

    include_once "translation_api_initializer.php";


    $bing_language_codes = array('da' => 'Danish',
    'de' => 'German','en'=> 'English','fi'=> 'Finnish',
    'fr'=> 'French', 'nl'=> 'Dutch','ja'=>'Japanese',
    'pl'=> 'Polish','es'=> 'Spanish','ru'=> 'Russian');

    if($_POST){

        $bing = new TranslationApiInitializer();

    $from_lang = isset($_POST['from_lang']) ? $_POST['from_lang'] : 'en';
    $to_lang = isset($_POST['to_lang']) ? $_POST['to_lang'] : 'fr';
    $text = isset($_POST['text']) ? $_POST['text'] : '';

    $result = $bing->textTranslate($text, $from_lang, $to_lang);
                                                          $bing->textToSpeech($result['msg'],$to_lang);

    }

    ?>

We include the TranslationApiInitializer file created earlier in the process and execute the translation and speech function respectively to generate the audio file. Once the code is completed, you will be able to generate audio files using this service.

You can take a look at a live demo at http://www.innovativephp.com/demo/spc_text_to_speech

Wrap Up

Throughout this tutorial we implemented translate and speech generation using Microsoft Translator API. Even though it’s easy to use and free to some extent, there are also limitations. The speech service is only provided for a limited number of characters, roughly around 2000. So it’s not possible use this service for a larger block of text. I recommend you use Acapela for large scale speech generation services.

Hope you enjoyed the tutorial and looking forward to your comments and suggestions.

Frequently Asked Questions (FAQs) about Text-to-Speech Translation

How does Microsoft Translator’s text-to-speech function work?

Microsoft Translator’s text-to-speech function works by converting written text into spoken words. It uses advanced machine learning technologies to recognize and interpret text, and then translates it into the desired language. The translated text is then converted into speech using synthetic voice technology. This feature is particularly useful for those who are visually impaired or for situations where reading text is not feasible.

What languages are supported by Microsoft Translator’s text-to-speech feature?

Microsoft Translator supports a wide range of languages for its text-to-speech feature. This includes major languages such as English, Spanish, French, German, Chinese, Japanese, and many more. The full list of supported languages can be found on the Microsoft Translator website.

Can I use Microsoft Translator’s text-to-speech feature offline?

Yes, Microsoft Translator allows you to download language packs for offline use. This means you can use the text-to-speech feature even when you don’t have an internet connection. However, the accuracy and quality of translations may vary when used offline.

How accurate is Microsoft Translator’s text-to-speech feature?

Microsoft Translator uses advanced machine learning technologies to provide accurate translations. However, like any machine translation tool, it may not always perfectly capture the nuances and context of certain phrases or idioms. It’s always a good idea to have a human translator review the translations for critical or sensitive content.

Can I customize the voice in Microsoft Translator’s text-to-speech feature?

Yes, Microsoft Translator allows you to customize the voice used in the text-to-speech feature. You can choose from a variety of voices and adjust the speed and pitch to suit your preferences.

Is Microsoft Translator’s text-to-speech feature free to use?

Microsoft Translator offers a free tier for its text-to-speech feature, which is sufficient for casual or personal use. However, for heavy or commercial use, you may need to subscribe to a paid plan.

Can I use Microsoft Translator’s text-to-speech feature on my website or app?

Yes, Microsoft Translator provides APIs that developers can use to integrate the text-to-speech feature into their own websites or apps. This allows you to provide translation services to your users directly within your platform.

How does Microsoft Translator’s text-to-speech feature compare to other similar services?

Microsoft Translator is known for its accuracy and wide range of supported languages. It also offers offline functionality and customization options, which are not always available in other services. However, the best choice depends on your specific needs and preferences.

How can I improve the accuracy of Microsoft Translator’s text-to-speech feature?

To improve the accuracy of translations, make sure your text is clear and free of typos or grammatical errors. Also, avoid using slang or idiomatic expressions as they may not translate well. If possible, provide context to help the translator understand the meaning of your text.

What are the limitations of Microsoft Translator’s text-to-speech feature?

While Microsoft Translator is a powerful tool, it has some limitations. For instance, it may not always perfectly capture the nuances and context of certain phrases or idioms. Also, the quality and accuracy of translations may vary when used offline. Lastly, while it supports a wide range of languages, there may be some less common languages that are not supported.