Google’s Flash Indexing Disaster

Kevin Yank

On July 1st, Google announced that, using technology provided by Adobe, it had enhanced the Google Search Engine to index the text embedded within Flash movies. What followed was bad advice from Google, second-guessing by web developers, and finally a few straight answers.

Google’s initial announcement was so incredibly vague as to render it all but useless. Developers came away knowing that Google was doing something different with their Flash content, but that’s about it.

While Google’s Dion Almaer suggested that search engines have always been black boxes and that it was up to us to discover what had changed through testing, just about everyone else was crying foul.

Google’s credibility was immediately in question due to the obviously bad advice it contained:

"If you prefer Google to ignore your less informative content, such as a "copyright" or "loading" message, consider replacing the text within an image, which will make it effectively invisible to us."

For the record, replacing fast-loading, accessible text content with a bulky image simply to hide it from search engines is never a good idea.

Google’s list of caveats in the announcement were similarly perplexing:

"Googlebot does not execute some types of JavaScript. So if your web page loads a Flash file via JavaScript, Google may not be aware of that Flash file, in which case it will not be indexed."

What types of JavaScript? Established best practice for publishing Flash content is to use the SWFObject JavaScript library to overcome bugs in older browsers, so was Google saying that it would only index Flash content that was authored using broken/outdated HTML-only techniques?

"We currently do not attach content from external resources that are loaded by your Flash files. If your Flash file loads an HTML file, an XML file, another SWF file, etc., Google will separately index that resource, but it will not yet be considered to be part of the content in your Flash file."

Any experienced Flash developer knows that if you are going to have any significant amount of text in your Flash content, your best bet is to stick it in an XML file and load it on the fly, so you don’t have to rebuild your Flash movie whenever you change the content.

Apparently, not only will Google not see Flash content authored this way, but it will track down the XML file anyway and index it as a separate page on your site! That’s right, Google will helpfully direct people searching for your content to the raw XML file that contains it, rather than your slick, Flash front-end.

All this stuff made so little sense, that many developers questioned whether Google was actually able to index any Flash content of consequence. Within a few days, however, the Search Engine War blog was able to verify that Google was indeed indexing Flash content.

Finally, after several days of developer outcry, Google admitted it had left too many questions unanswered, and four days later, it posted a significant update that is well worth reading if you have any Flash content on your site.

Here’s a quick summary of what we now know:

  • The July 1st release didn’t index Flash content inserted with the SWFObject library‘s dynamic publishing method, which writes the Flash content into the page entirely with JavaScript. The recommended static publishing method (where two nested <object> tags are included in the page) was indexed. Google is now deploying an update that supports the dynamic publishing method as well.
  • Text content loaded on-the-fly from an XML file is not yet indexed, but Google is working on fixing this in the near term.
  • Google will do its best to detect when duplicate content is there to provide an HTML alternative to Flash content, and will only display one of the two versions in the search results. No penalty is applied to a site’s search ranking due to duplicate content.

There are still unknowns here, but that will always be the case with the Google search engine. Though it took a few days, Google is answering what questions it can, and responding to developer concerns with enhancements.

Before very long, most of the text within Flash-based web sites will make its way into the Google search index. Nevertheless, uncertainty will remain over how deeply Google is able to probe Flash content for a while yet. Providing non-Flash alternative content will remain an effective means of guaranteeing your most important content a place in the Google index. It also gives users of non-Flash-enabled browsers (like the iPhone) something to look at.

Though Google’s initial message was pretty half-baked, the follow-up has put most of my concerns to rest. How about yours?