Ajax changed the Web. Microsoft pioneered the technologies back in 1999, but an explosion of Web 2.0 applications appeared after Jesse James Garrett devised the acronym in 2005.
Although it makes web applications slicker and more responsive, Ajax-powered content cannot (always) be indexed by search engines. Crawlers are unable run JavaScript content-updating functions. In extreme cases, all Google might index is:
<html>
<head>
<title>My Ajax-powered page</title>
<script type="text/JavaScript" src="application.js"></script>
</head>
<body>
</body>
</html>
It’s enough to make an SEO expert faint.
Google to the rescue?
Google has devised a new technique to make Ajax applications crawlable. It’s a little complex, but here’s a summary of the implementation.
1. Tell the crawler about your Ajax links
Assume your application retrieves data via an Ajax call to this address:
www.mysite.com/ajax.php#state
You’d need to modify that URL and add an exclamation mark following the hash:
www.mysite.com/ajax.php#!state
Links in your HTML will be followed, but you should add these new links your sitemap if they’re normally generated within the JavaScript code.
2. Serving Ajax content
The Google crawler will translate your Ajax URL and request the following page:
www.mysite.com/ajax.php?_escaped_fragment_=state
Note that arguments are escaped; for example, ‘&’ will be passed as ‘%26’. You will therefore need to unescape and parse the string. PHP’s urldecode()
function will do nicely.
Your server must now return a snapshot of the whole page, that is, what the HTML code would look like after the Ajax call had executed. In other words, it’s a duplication of what you would see in Firebug’s HTML window following a dynamic update.
This should be relatively straightforward if your Ajax requests normally return HTML via a PHP or ASP.NET process. However, the situation becomes a little more complex if you normally return XML and JSON that’s parsed by JavaScript and inserted into the DOM. Google suggests creating static versions of your pages or using a system such as HtmlUnit to programmatically fetch the post-JavaScript execution HTML.
That’s the essence of the technique, but I’d recommend reading Google’s documentation — there are several other issues and edge cases.
Is this progress(ive)?
Contrary to popular opinion, sites using Ajax can be indexed by search engines … if you adopt progressive enhancement techniques. Pages that work without JavaScript will be indexed by Google and other search engines. You can then apply a sprinkling of Ajax magic to progressively enhance the user experience for those who have JavaScript enabled.
For example, assume you want to page through the SitePoint library ten books at a time. A standard HTML-only request/response page would be created with navigation links; for example:
library.php
<html> <head> <title>SitePoint library</title> </head> <body> <table id="library"> <?php // show books at appropriate page ?> </table> <ul> <li><a href="library.php?page=-1">BACK</a></li> <li><a href="library.php?page=1">NEXT</a></li> </ul> </body> </html>
The page can be crawled by Google and the links will be followed. Users without JavaScript can also use the system. JavaScript progressive enhancements could now be added to:
- check for the existence of a table with the ID of “library”
- add event handlers to the navigation links and, once clicked
- cancel the standard navigation event and start an Ajax call. This would retrieve new data and update the table without a full page refresh.
While I welcome Google’s solution, it seems like a complex sledgehammer to crack a tiny nut. If you’re confused by techniques such as progressive enhancement and Hijax, you’ll certainly have trouble implementing Google’s crawling system. However, it might help those with monolithic Ajax applications who failed to consider SEO until it was too late.