Until now, it’s been difficult to get Flash content indexed. About the only thing you could do was to use Flash in conjunction with HTML text links to make a page crawlable by search engine spiders — a solution that also had the advantage of making Flash pages readable by users who didn’t have a Flash plug-in. And of course, HTML text links will remain a good idea for a while, as so many people continue to resist plug-ins. But it sure would be nice if Flash pages were indexed directly…
Well, now they are. FAST just announced AllTheWeb and its partner sites will index Macromedia Flash content and applications — a breakthrough that allows many popular sites to achieve expanded visibility through search engines.
Increased exposure for Flash sites is good news for designers and users alike, because it means that AllTheWeb, Lycos, InfoSpace, and other partner sites using FAST technology can readily display links to caches of Flash content that weren’t available to users before.
Advanced Search Options
AllTheWeb alone serves over 100 million users per month. Its users can now further refine their searches in Macromedia Flash content and applications by using the site’s Advanced Features section.
You’ll find a couple of new options here:
- A new Result Restrictions option, which allows you to specify the document depth at the directory level when querying your search terms. That is, you can ask for search terms that appear at the home page level and above or below, all the way to the 10th level directory of the site architecture. Alternatively, you can search "all document depths."
"This new, advanced search feature from FAST presents a winning solution for the millions of people looking to perform highly-specialized searches for relevant information contained with Macromedia Flash content on the millions of sites which include our technology", said Macromedia. It seems to me that these new advanced search options can certainly improve the user experience — provided that ordinary users become aware of them and use them.
When we first heard about this new functionality in our office, Operations Manager wasn’t sure she got it right. So she queried Peter Gorman, Director, Corporate Communications for Fast Search & Transfer:
Question: Is FAST able to go into a site’s embedded .swf files to crawl and index the text within that file just like search engines do with HTML documents?
Answer: Yes, FAST crawls links as they appear within the document and treats Flash files like HTML when converted. FAST uses the Flash Search Engine SDK, which basically converts the Flash app into a HTML file.
More information on the Flash Search Engine SDK has more details.
We immediately downloaded the Flash Search Engine SDK from the Macromedia site and were able to convert .swf files into html. Not only does this tool allow FAST to index Flash .swf content, it may allow us to analyze our clients’ Flash files and assist them in optimizing their sites. This will become important as other search engines follow FAST’s lead in indexing Flash content.
Converting .swf Files to HTML
Once we downloaded the Macromedia app, we ran it on a couple of .swf Flash files downloaded from Macromedia’s Website. Below is a screen shot of one of those files — you can see the links and text that are viewable through a browser.
Now, when you run the swf2html application from a DOS prompt and type in the command to convert the .swf file to HTML, you get the following, which shows that there are other .swf files embedded within the first .swf file, along with text links and regular text. Since many .swf files are embedded in each other, we’re assuming FAST would just keep converting all of them until it finished and had completely indexed the site.
We wondered how Fast pulled the TITLE and description for the search engine listings for Flash pages. We assumed these elements would come from the original HTML document’s
<TITLE> and meta description tags, simply converting any Flash on the page to HTML, with this data added to the index in addition to the HTML page content.
But what if the Flash site wasn’t embedded in an HTML document, and was just a .swf file? Where would Fast get its listing title and description?
We got answers from FAST engineer Rolf Michelsen, who confirmed our inital assumptions were correct. "When the Flash file is embedded as part of a HTML document, we use the document title and various heuristics to extract title and teaser for our search results. The heuristics for extracting a teaser may use the meta description tag if present," said Michelsen.
"When indexing a stand-alone Flash file, we extract title and teaser directly from the Flash file — basically trying to compose a teaser from the first few sentences of text extracted from the Flash file," concluded Michelsen.
So there you have it — from the horse’s mouth. This makes the critical point that designers should always embed their .swf Flash files in an HTML document and add a
<TITLE>, meta description, and meta keywords tags to ensure their Flash based pages will be indexed in the search engines with a title and description of which they approve. Otherwise, the listing may show "No Title," and the description will be the first text indexed in the file, which may not be advantageous.