How to Speed Up Your App’s API Consumption

Jacek Barecki

Introduction

In the process of creating a PHP application you may come to a point when keeping it isolated from remote resources or services may become a barrier in its development. To move along with the project you may employ different API services to fetch remote data, connect with user accounts on other websites or transform resources shared by your application.

The ProgrammableWeb website states that there are currently more than ten thousand APIs available all over the web so you’d probably find a lot of services that can be used to extend your PHP app’s functionality. But using APIs in an incorrect way can quickly lead to performance issues and lengthen the execution time of your script. If you’re looking for a way to avoid it, consider implementing some of the solutions described in the article.

Make multiple requests at a time

When a typical PHP script is being executed, the commands put in the code are run one after the other. It seems completely logical as you probably want to get the result of the previous operation (e.g. a database query or a variable manipulation) to move to the next step of the script. The same rule applies when you make an API call. You have to send a request, wait for a response from the remote host and then you can do anything with the data you received. But if your app makes several API calls and you need the data from each source to move on, you don’t have to execute each request separately. Remember that the server responsible for handling API calls is prepared to work with several queries at a time. What you need to do is just to create a script that executes API calls in parallel, not one after another. Fortunately, PHP offers a set of curl_multi functions which are designed to do it.

Using curl_multi functions is similar to making typical requests in PHP with cURL library. The only difference is that you need to prepare a set of requests to execute (not just one) with the curl_init function and pass them to the curl_multi_add_handle function. Then, calling the curl_multi_exec function will execute the requests simultaneously and curl_multi_getcontent will let you get the results of each of the API call. Just read here to see a code example which implements the described logic.

If you want to employ curl_multi functions in your PHP application, there are some important points to remember. First of all, the curl_multi_exec function will take as long as the slowest API call in the set of requests passed to the curl_multi_add_handle function. Using curl_multi thus makes sense in cases where each of the API calls takes a similar amount of time. If there is one request that is significantly slower than the others in a curl_multi set, your script won’t be able to move on until that slowest request is finished.

What is also important, is that you need to identify the number of parallel requests that can be executed at a time. Remember that if your site handles a lot of traffic and each user triggers simultaneous API calls to one remote server, the total number of the requests being made at a time may quickly become high. Don’t forget to check the limits stated in the API documentation and get to know how the service will respond when you hit them. The remote server may send a specific HTTP response code or an error message when you hit the limits. Such cases should be properly handled by your application or put in a log so that you can diagnose the issue and lower the number of requests.

Separate API calls from the app main flow

If you want to keep your web application responsive and avoid serving pages that load slowly, a high amount of API calls being made to a remote server may make this task lot more difficult. If all of the requests are made within the main app flow, the end user won’t see a rendered page until the PHP script receives the API responses and processes the data. Of course there are plenty of API services that are hosted on fast servers and process the requests quickly. But still, your app may occasionally get slowed down by connection lags or some random factors impacting the connection process or the remote server itself.

If you want to protect the end user from such issues you need to separate the part of the application responsible for handling the requests from the main flow to a standalone script. It means that the API calls will be executed in a separate thread that doesn’t interfere with the part of the code responsible for displaying the site.

To implement such a solution you may just write a separate PHP script and execute it using the exec() function, just like you would execute any command line application. Different PHP frameworks often offer modules that simplify writing command line scripts and allow you to integrate them easily with existing application models or components. Just check Symfony2 or CakePHP console components to see some examples. Various PHP platforms – not only frameworks – may also offer tools that make writing command line scripts easier, like WP CLI – a command line interface for WordPress.

If you’re looking for a more powerful way of handling API calls in a separate process, consider setting up a job server like Gearman. A job server is a complete solution that performs all the actions necessary to separate specific tasks (jobs) to independent processes. Read Alireza Rahmani Khalili’s Introduction to Gearman article to check how it works and how to implement it in PHP. If you work on the Zend Server platform, you can employ the Zend Job Queue component which offers similar functionality. Its features and usage examples are described in Scheduling with Zend Job Queue article written by Alex Stetsenko.

No matter which solution of separating API calls you choose, you have to think of a way for the different parts of your app to communicate with each other. First and foremost, you should put the data received from an API call in a place (e.g. a database table or a file) accessible by the whole app. You also have to share the status of the execution of a separate script. The main application has to know whether the API call executed externally is already in progress, has completed a while ago or has failed. If you think of employing a job server solution, it will probably offer a functionality to monitor the job status. But if you just want to stick with writing a simple PHP command line script, you will have to implement such logic by yourself.

Multiple HTTP requests or multiple threads?
So which solution is better – employing curl_multi functions to execute several HTTP requests at a time or separating API calls from the main app flow? Well, it depends on the context in which the remote server is being queried. You may find out that the whole API calls handling script takes long not only because of the requests being made. There may be also an extensive code responsible for dealing with the received data, especially when it includes transforming files or making heavy database writes. In such cases using the curl_multi functions probably won’t be sufficient to speed up your app. Running a separate thread responsible for the whole operation, along with processing the data received from a remote host, may result in achieving better results in terms of the performance of your app. On the other hand, if you need to execute a lot of simple API calls which doesn’t involve heavy data processing at your side, sticking with the curl_multi functions will probably be enough to make your app faster.

And of course there is a third solution – mixing the two ways described above. So you can run a separate thread responsible for dealing with API calls and then try to make it run faster by making multiple requests at a time. It may be more efficient than executing a separate script for each request. But it may also require a deeper analysis on how to design the flow of the script in a way that different script executions and different API calls executed at once don’t interfere with each other and don’t duplicate each other’s job.

Build a smart cache engine

Another solution to speed up an application that relies heavily on API usage is building a smart caching engine. It may prevent your script from making calls which are unnecessary as the content located on a different server hasn’t changed. Proper caching can also reduce the amount of data transferred between the servers in a single API call.

To write a cache engine that works properly and returns valid data, you need to identify the cases in which the response from a remote server doesn’t change so there’s no need to fetch it every time. It will probably differ depending on a specific API service but the general idea is to find a set of parameters (which are being passed in the request) which give the same response in a given time period. For example, if you fetch daily currency exchange rates from a remote service, you can be sure that the exchange rate for a given currency (which is the parameter) stays the same for the whole day. So the cache key for storing data received from this particular API has to contain both the currency and the date. If your app will have to fetch this specific exchange rate next time, you can refer to the data saved in cache (e.g. in a database or a file) and avoid making an HTTP request.

The scenario described above assumes that your application takes all the responsibility for examining the cases when the data received from a remote service can be cached so you need to implement proper caching logic by yourself. But there are also cases in which an API service tracks the changes in the data it shares and returns additional fields containing the metadata linked with a certain resource. The metadata may be composed of such values as last modification date, revision number or a hash computed basing on the resource content. Making use of such data can be a great way to improve the performance of your PHP application, especially when dealing with large amounts of data. Instead of fetching the whole resource each time you connect with the API, you just need to compare a timestamp or a hash with a value that you had received the last time. If they are equal, it just means that you can use the data fetched before as the remote content hasn’t changed. Such solution assumes that you do employ a caching engine in your application, but you don’t need to worry if the data stored in cache is valid. As you rely on the metadata being returned by the API service, you only need to compare the metadata values given by the remote server.

Using remote resources metadata may be especially beneficial when employing a file hosting service API. Working with remote folders and files usually means transferring a lot of data which may lead to performance issues. To give you an example on how to avoid this, let me describe the solutions used in the Dropbox API. The Dropbox API service returns specific data that should be used to check whether the remote files have changed. First of all, the metadata method (which returns folders and files information like their names, sizes or paths) contains the hash field representing the hash value of the returned resource. If you provide a hash value from a previous request as a parameter in a new one and the remote data hasn’t changed between requests, the API will just return a HTTP 304 (Not modified) response. The Drobox API also offers the delta method which is created exclusively for informing about the changes in specific folders or files. Using the hash values and the delta method is recommended in the API documentation as it may give your application a significant performance boost.

Last but not least: master the API documentation

It may sound obvious but in some cases reading the API documentation thoroughly may provide you with specific solutions on how to make API calls more efficiently. The Dropbox API usage described above is a very clear example. But there may be other ways to reduce the amount of data being transferred in a response (e.g. selecting only a few specific fields to be returned by the API instead of receiving the whole dataset). You can also check whether the actions you execute in separate requests cannot be performed at once. For example, the translate method of the Google Translate API (which is being used for fetching text translations in different languages), may return more than one translation in one request. By passing a few text strings to process in a single API call, you can avoid making multiple requests which will probably result in saving some app execution time.

Summary

As you can see, there are many ways to improve the performance of a PHP application which relies heavily on using remote APIs. You can execute multiple requests at once – either by using curl_multi functions or by running separate application threads. Another solution is to implement a caching engine which will prevent you from making unnecessary API calls or lower the amount of the data transferred between servers. Finally, the methods offered by the API service may provide you with some out-of-the box solutions to get a performance boost, just like executing multiple actions in one request.

I hope the article gave you some insight into how to handle API requests efficiently. If you have any comments regarding the points presented in the article or any other tips on how to speed up working with APIs, feel free to post it below. You can also contact me directly through Google Plus.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://pilotci.com/ cordoval

    nice article, i think Guzzle 4 supports multi curl already right? The other thing is about the processing no wonders why symfony latest developments are focusing on process component and why the interest from the symfony community to leave the console apps light in a sense that they don’t use DI.

  • Sp4cecat

    Although I’m yet to let it loose in any of my own applications, I hear good things about Beanstalkd for processing queues. Any experience with Gearman v. Beanstalkd?