A step towards accessible Flash content: providing real-time Flash content for search engines, and bookmarking Flash Websites.
Part 1 – SEO for Flash, starts below.
Part 2 – Bookmarking Flash, starts here.
The State of Play
Today’s search engines have grown up in partnership with the traditional html document; not surprisingly, they are fantastic at indexing every last piece of information found within html files.
Great! But, what happens when your new Website has no html content at all? This occurs most often with Websites built in Macromedia Flash, where the content is locked up in a file that search engines completely ignore.
This question led me to think about what could be described as the holy grail of Flash Web development: achieving good search engine listings, based on current content, for Flash Websites that have no ‘surrounding’, relevant html content — in other words, what’s come to be known as the Flash site.
In this case study I’ll show you how to accomplish this task in a way that’s simple, scalable and transparent to your general Website visitor — no annoying redirects, refreshes or ‘hidden’ text to consider. It has the added benefit of allowing sections, including frames and scenes, of a Flash Website to be bookmarked for reference at a later date.
I’ll also illustrate alternate methods for providing Flash content to search engines.
This process is built using disparate concepts that, on the face of it, have nothing to do with Flash. Joined together, though, they provide the key to success.
Utilise this method, and you’ll have a Flash Website with content that is directly accessible via the URL or external hyperlinks. The knowledge that search engines can index Flash Websites effectively will allow you to promote Flash as a viable technology in which to develop and promote your clients’ Websites.
First Thoughts
The idea for developing a scalable and easy-to-maintain system to allow search engines to index Flash sites hit me when I was in the initial phase of building my own Website.
In 2000, I evaluated the options and decided to build the Website using Flash, knowing the general consensus among the online world (some gurus in there as well) was that this approach meat I’d be tossing site accessibility out the window. Among the stated crimes of an all-Flash site included assumptions such as ignoring both the browser’s back and forward buttons*, and an inability to make the site available for search engine indexing (under the general umbrella of direct linking).
Being the questioning person that I am, I set as one of the objectives for my Website that it competently address the issue of search engine indexing. My site had to be fully indexed by search engines, enabling prospective clients to find me amid my contemporaries in the search result listings.
*Robert Penner, employing some lateral thinking, proves this assumption wrong using a frameset (a solution which may not be to everyone’s liking).
As I progressed, I realised that a number of concepts were involved in my achievement of this goal.
Concept 1: External Content is The Key
After reading many, many Flash resources, I saw that a key to accomplishing Flash site SEO was to separate the Website content from the Flash movie – to load it in from external sources. These sources could constitute anything from a database table or XML file, to a simple plain text file.
The benefit I saw in storing my text content outside the SWF file was that it could be used in a multitude of external devices. For instance, the text in a database table can be used in XHTML, Flash and, in looking to the future, an XML-ready device.
My choice for storing and retrieving content, which was made as a result of previous experience with the technology, was a MySQL database with PHP as the server side script. You could, however, conceivably use any database/script combination to accomplish this.
Now, storing content is one thing, reliably retrieving and serving the content for it to be indexed by the search engines and viewed in a Flash movie by humans was another. Time for some search engine research…
Concept 2: Understanding Indexing
This is by no means an exhaustive look at how search engines get their content — that would take up more than a few dead trees! However, we still need to have a quick look at how Web content ends up in a search engine.
The well-known search engines get their content via bots that travel around the Web following links and sending the information they find back to their respective engines. Each bot is identified by a unique name, called a user agent string (string is code talk for a piece of text, and has nothing to do with a programmer’s shoe laces). In fact, nearly every visitor, bot, Web browser or otherwise to a Website provides their own user agent string.
Let’s look at a few, to see what we’re dealing with:
- Googlebot/2.1 (+http://www.googlebot.com/bot.html)
- Mozilla/3.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
- FAST-WebCrawler/3.3 (crawler@fast.no; http://fast.no/support.php?c=faqs/crawler
- Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130
The first three user agent strings above identify searchbots from Google, Inktomi and FAST. The last is a standard Mozilla 1.x identifier.
The very fact that reputable search engines give consistent user agent strings that differ from our human visitors allowed me to divide traffic to the Website into two main categories, and one minor category:
- Searchbots
- Humans using Web browsers
- Unknown (those visitors too difficult to identify by the supplied user agent)
Searchbots, at their core, are stripped down versions of browsers. They ignore client side scripting such as JavaScript, and can ignore tricks like Meta tag refreshing and making text the same colour as the document background.
But what of server-side scripts? Searchbots, like their Web browser cousins, accept the page the server sends. Could I then get my Web server to send different pages based on the results of interrogation of the User Agent string? You bet!
Concept 3: Targeting Content From The URL
The starting point for targeting content in an HTML Website is the URL, and the same concept applies to Flash Websites. When a page is requested via the URL, the filename is sent to the server, along with any variables, which are appended to the end of the filename.
We’ve all seen something like this:
google.com/search/?q=flash+and+search+engines
In my case it was:
/index.php?go=4
That part of the filename after the ?
character is known as a query string* and will be available to the script index.php when it runs. In addition to this, any variables that are sent to a PHP script like this will be available in an array (think a cabinet where every draw holds a piece of information) called:
http://www.php.net/manual/en/reserved.variables.php#reserved.variables.get $_GET.
In this case we’ll have $_GET[‘go’]. (In the Google case above, we’d have $_GET[‘q’])
*An alternative way to accomplish this involves rewriting the URL. For example, instead of this index.php?go=4 we might have /go/4/. This is seen to be friendlier to search engines, but has a steeper learning curve when compared to a simple query string. And yes, search engines like Google do index pages with query strings, as long as they are kept short (I’ve found through experience that I can get listings with two variables in the query string). I’ll leave the choice to you, but to keep it simple, I’ll stick with the query string for now.
My first task here was to validate the contents of $_GET[‘go’]. This is necessary because the URL can be changed by the Website visitor, and it’s a central part of good Web scripting to not trust anything that a script receives via the GET
or POST
methods.
So, let’s start by interrogating the variable that’s fallen into our script. I’ve fully commented the PHP script that does this (lines with // or surrounded by /** **/), so if you’re codaphobic, read these!
Let’s also assume that the database connection was successful.
<?
/** - getparse.inc.php
---------- what we want to achieve here -----------
- check if the 'go' variable exists
- strip out any content that may be hazardous to the script
- convert 'go' to an integer for the sql query
- nb : all content kept in the database is identified by a unique integer.
- nb : error content is identified in the database by the id '-1'
**/
//first a function to parse the 'go' variable
//functions are not run by the script until called
function parsego($go_var)
{
//check if the variable is an integer
if(!is_int($go_var))
{
//the variable is not an integer, set it to error content.
$go_var=-1;
}
//return the parsed variable.
return $go_var;
}
//go is passed from the URL header.
//it is available in the super global array $_GET as element 'go'
//$go is the variable used to identify the content to be retrieved.'
//first check if the variable exists,
//if not set it to retrieve default content (1).
if(!$_GET['go'])
{
$go=1;
$title_add = "welcome";
}
//$_GET['go'] exists
else
{
//check the 'go' variable for any 'hazardous' content
//it needs to be converted to an integer using intval()
$go = parsego(intval($_GET['go']));
//check to see if the content requested exists.
$checkGo_sql="SELECT id, header FROM sitecontent WHERE id=$go";
//run the query
//the @ symbol is used to suppress error messages from PHP
$checkGo_result=@mysql_query($checkGo_sql);
//check the number of rows returned.
$checkGo_rows=@mysql_num_rows($checkGo_result);
if(!$checkGo_result || $checkGo_rows==0 || !$checkGo_rows)
{
//if there is an error or if there are no matches,
//set to retrieve error content
$go=-1;
$title_add = "content not found";
}
else
{
$go = @mysql_result($checkGo_result, 0, 'id');
$title_add = @mysql_result($checkGo_result, 0, 'header');
}
}
/** ----- results ------------------
- we have a valid value for $go that can now be
used to retrieve content from the database.
- it is targeting either the error content (-1) or existing content (>=1)
**/
?>
Concept 4: Checking for Search Bots
I now have a valid variable for sourcing content from the database table, so it’s time to check who or what is visiting the Website. Again, everything is fully commented. For this example, I’ve introduced a tool that checks to see that the script is functioning as it should. In a live Website, you’d want to remove the if-else
statement at the start of the code.
<?
/** ------ botcheck.inc.php -------
- determine if the user agent is a known search bot.
**/
//for spoofing the system so that it can be checked
//and validated from the URL.
//remove this when using live.
if($_GET['useragent'])
{
$user_agent = $_GET['useragent'];
}
//if $_GET['ua'] is not available, treat it as a real request.
else
{
$user_agent = $_SERVER['HTTP_USER_AGENT'];
//$_SERVER['HTTP_USER_AGENT']; is where
//PHP holds the user agent string.
//for PHP versions older than 4.3, use $HTTP_USER_AGENT;
}
//a list of terms found in some searchbot strings in my log files
//also some text only browsers thrown in. eg Lynx
//and some that should never be allowed near a Flash movie (web tv)
$searchbot_short_array = array("FAST-WebCrawler/", "Googlebot/",
"Googlebot-Image/", "Ask Jeeves/Teoma", "Ask Jeeves",
"Google WAP Proxy", "Slurp/", "Gigabot/", "Poodle predictor",
"AlkalineBOT/", "Scooter-", "Scooter/", "ASPSeek/", "Sqworm/",
"TurnitinBot/", "Lynx/", "Lycos_Spider", "appie", "walhello",
"WebTV", "LinkWalker", "SurveyBot/", "suzaran", "polybot",
"webcollage/", "Teleport Pro/", "search.ch", "LWP::Simple",
"EasyDL", "Minerva", "RPT-HTTPClient", "IA_Archiver",
"Spinne/", "Webster Pro", "MSProxy", "ZyBorg/",
"Indy Library", "NPBot", "Girafabot",
"Gulper Web Bot", "grub-client");
//traverse the array and look at each element
//if a bot is found, set a variable for later
foreach($searchbot_short_array as $search_for)
{
//attempt to match the array value against the user agent string.
//eregi is a case insensitive regular expression matching function
//in a live setting
//replace $user_agent with $_SERVER['HTTP_USER_AGENT'];
if(eregi($search_for, $user_agent))
{
$bot = true;
}
}
?>
If, for instance, the Altavista searchbot, Scooter, hit the Website, the $bot
variable would be assigned as ‘true
‘.
Concept 5: Deploying Content Based on the User Agent
There are now two important pieces of information in the script environment:
- a valid identifier for content in the database table (
$go
), and - whether the Website visitor is a bot (
$bot == true
) or not/unknown ($bot
is not set in the script environment, i.e. ‘false
‘).
It’s now time to branch the script depending on the visitor type, and source the requested content.
<?
/** --- filtering.inc.php
- display text based content if a bot is found
- ignore if not.
**/
//$bot is set in botcheck.inc.php
if($bot)
{
//show searchbot friendly HTML content according to what
was decided in getparse.inc.php
$getBotFriendly_sql = "SELECT header, content, id FROM sitecontent
WHERE id=$go";
//run the query
$getBotFriendly_result=@mysql_query($getBotFriendly_sql);
//set some results
$content_header = @mysql_result($getBotFriendly_result,
0, 'header');
$content = @mysql_result($getBotFriendly_result,
0, 'content');
//get all the headers and id's for links that the search
bot can follow.
$getAllHeaders_sql = "SELECT id, header FROM sitecontent";
$getAllHeaders_result=@mysql_query($getAllHeaders_sql);
?>
<div id="vert">
<div id="header">
<b>From little things, big things grow :</b>
<br />
providing real-time Flash content for search engines &
bookmarking Flash Websites
- <b><?echo $content_header?></b>
</div>
<div id="content">
<h4><?echo $content_header?></h4>
<?
//convert newlines (n or r or c) in the content to <br />
echo nl2br($content);
?>
<h4>site navigation headers from the database table...</h4>
<?
//show the links to other current content available for archiving
//the search bot will follow these links.
while($header_row = @mysql_fetch_array($getAllHeaders_result))
{
if($header_row['id'] != -1)
{
echo "» <a href="$abspath?go=$header_row[id]"
title="more on : $header_row[header]">";
echo "$header_row[header]</a>n<br />n";
}
}
?>
</div>
<div id="footer">
Â
</div>
</div>
<?
//include the html footer (</body></html>)
include("htmlfoot.inc.php");
// exit from the script so that the flash content is not
displayed to the bot.
exit;
}
?>
Linking it All Together – Passing the Content Identifier into Flash
Finally, I need to set up a layout for the index.php file that will hold the above three inclusions, and pass the value of $go into the Flash movie. To accomplish this, we append the value to the end of the Flash filename.
<?
/** ------- index.php ---------- **/
/** ----------- start includes ----------**/
//connect to mysql server and select the database
include("database.inc.php");
//parses the variables passed from the URL
// and checks go against the database
include("getparse.inc.php");
//include the html head content <html><head><title> etc
//also a <link .. /> to the stylesheet, meta tags
//and usage of the $title_add variable
include("htmlhead.inc.php");
//check to see if the user agent is a bot.
include("botcheck.inc.php");
//provide plain text content if the user agent is a bot.
include("filtering.inc.php");
/** --------- end includes -----------**/
//deploy the flash content for the humans among us
//note the go variable is being passed into the flash movie,
//available as _root.go.
//This is the key to successful bookmarking of Flash movies.
?>
<div id="wrapper">
<!-- XHTML1.0 compliant code to deploy flash movie -->
<!-- if you are seeing a textarea-like box
you have a corrupt Flash Player ActiveX control.
Try uninstalling Flash Player then installing the
latest and greatest -->
<object
data="<?echo $abspath?>index.swf?go=<?echo $go?>"
type="application/x-shockwave-flash" codebase="http://download.
macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,0,0"
width="740"
height="420">
<param name="movie" value="<?echo $abspath?>index.swf
?go=<?echo $go?>" />
<param name="menu" value="false" />
<param name="quality" value="high" />
<param name="bgcolor" value="#fefefe" />
<a href="http://www.macromedia.com/go/getflashplayer"
title="link to the Macromedia Flash Player download page">
Get the latest Flash Player from Macromedia
</a>
</object>
</div>
<?
//end the html
include("htmlfoot.inc.php");
?>
Making PHP talk to Flash
That’s it for the PHP scripting: three simple inclusions, and one that contains script that will combine everything together.
The code has successfully supplied a known searchbot with text content from the database table, and stopped it from indexing the Flash html code. The searchbot will now send back to its parent search engine the text and hyperlinks it finds, and the link to this content will soon appear in the search engine listings.
The second part of this process involves what happens when one of the hyperlinks that’s found in a search engine listing is activated by one of us, or when the URL is typed directly into the browser — i.e. when $bot
is not set. In this case, the PHP interpreter skips entirely the supplying of text content in filtering.inc.php, and moves on to the code that displays the Flash movie.
The PHP script now needs to pass the valid request to Flash, accomplished by adding the $go
variable to the Flash movie filename. What this does is very simple yet powerful — it will pass the value of $go
into the movie, making it available to ActionScript as the variable _root.go
.
This is the linkage between the URL and the Flash movie and, most importantly, makes the whole process possible.
The ActionScript Starts
Once my Flash Website loads up, I can retrieve content from the database based on the value of _root.go
, knowing that it has been validated by the previous PHP script. This will be the same content retrieved by a search engine using $go
as the identifier.
There are many ways to retrieve dynamic content for Flash. One method uses LoadVars, a Flash Player 6-based object that can talk to a server side script like PHP. It won’t work with older versions of the Flash Player (lower than 6), so if your target Player is version 3 to 5 you should use the loadVariables function.
_root.go
will be available to the Flash movie from frame 1, so it can instantly be added to aLoadVars
object:
contentFetcher = new LoadVars();
contentFetcher.id = _root.go;
contentFetcher.sendAndLoad("requestcontent.php",
contentDisplay, POST);
Here I transfer _root.go
into a new LoadVars
object called contentFetcher
, then send the object content(s) to a PHP script called requestcontent.php. This PHP script retrieves content from the database table and prepares it for Flash in url-encoded format. Once the script is run successfully, the results are retrieved for display in the movie using an onLoad event handler.
Achieving the Target
My initial target of getting the search engines to index my Website was achieved! I soon found my Website in Google’s cache. There was my listing, with all the content from my database indexed and no Flash visible at all. Paradoxically for a Flash developer, this is what I wanted!
The beauty of this system can be seen in
- Simplicity: three basic scripts do all the work
- Scalability: the virtue of having the content for search engines update automatically whenever the ‘human readable’ content is updated
- Transparency: for both types of site visitors, bot and human, the other’s content is rendered invisible with some simple scripting
- Platform Independence: the scripts run on the server, so there are no problems with non-compliant browsers mangling the code
As with most things in life, achieving something worthwhile tends to have positive run-on effects. In this case, my tinkering with passing variables to Flash from the URL partly solved an accessibility bug that acted as a barrier for many who considered building Flash Websites — bookmarking.
For the second part of this case study I’ll belt out some code on how bookmarking can be achieved. We’ll look at some alternate methods you can use if you don’t have a database, outline some advantages and disadvantages of the system, along with possible future directions, and discuss a quick warning regarding cloaking.
And a note – you can download the code for both articles right here!
Serendipity – Bookmarking
Yes! Bookmarking a Flash Website for direct return is possible. With the SEO solution we discussed in the first part of this article, once each piece of content was loaded into the Flash movie it could be identified via a unique id. And, with a bit of Flash — JavaScript interaction, I could provide a link to allow visitors to bookmark the content that was currently visible.
Internet Explorer 4+ for Windows allows a bookmark URL to be written to it using a little IE specific Javascript. Browsers such as Mozilla, Opera, and basically anything that’s not IE for Windows, do not allow this functionality, for some commonsense security reasons.
Flash can communicate with JavaScript if we add a function call into an HTML hyperlink and store this anchor in the content that’s retrieved. Here’s the ActionScript, and code for the hyperlink:
bkurl = "http://www.webqs.com/?go=" + contentDisplay.id;
bktitle = "webqs.com : " + contentDisplay.header;
<a href="javascript:setBookmark(bkurl, bktitle)">bookmark
this content</a>
Simple! When clicked in Flash, this dials up a JavaScript function called “setBookmark
” that can be placed in the <head>
tags of the HTML page in question:
<script type="text/javascript">
<!--
function setBookmark(url,pagetitle)
//function performs the bookmark action
{
//code for IE4+ on win32
if (navigator.appName == "Microsoft
Internet Explorer" && navigator.appVersion >= "4.0" &&
navigator.platform == "Win32")
{
window.external.AddFavorite(url, pagetitle);
}
//this code posts a prompt if not
else
{
window.prompt('Hit Ctrl-Shift-D
(Command-D on a Mac, Ctrl-T in Opera) to bookmark this
page.nCopy and paste this address and title for the
bookmark :' , url + ' ' + pagetitle);
}
// -->
</script>
Note that, when compared to standard HTML pages, this method has the big disadvantage that the Website visitor can’t simply hit their bookmark keystrokes to store the URL for the current content (unless it is the URL for the content requested). Why not? Because, whenever new content is requested inside a Flash movie, the URL stays the same throughout the process — it doesn’t change to reflect the updated content.
Alternative — What if I Don’t Have a Database?
The method described above for having Flash content indexed by search engines is based on the assumption that we’re using a database as the storage medium. So, what happens if your content is stored in a plain text file or embedded in the actual Flash movie?
Retrieving content from plain text files with PHP is a simple task that uses the language’s built in filesystem functions. Simply put all the content in an array, using the file()
function, then traverse the array and print out each element. To match the text files with what has been entered in the URL, we must use a naming system such as 1.txt, 2.txt etc. Instead of SELECT
ing data from the database table using the value of $go, you could target a ‘$go.txt’ file for reading and printing to the screen.
If, for some reason, you have embedded text in the Flash movie, then make a copy of what has been embedded and place it in text files. It can then be retrieved using the method above. You will find that this has te done every time the Flash content is updated, so it may be wise to think about migrating to a text file, XML file or database to store the content.
Advantages
Just the fact that search engines can index the most up-to-date Flash content on a site, without having to manually update the search engine content, is a huge boost to the accessibility of Flash Websites.
Furthermore, our clients, and we as developers, see the tangible benefits of a Flash Website achieving a good rank in the search engine listings, based on actual site content.
Your clients may also be interested in understanding which areas of content are more popular. Most log file analysers will recognise filenames with query strings as independent files. And, as such, we can provide our client with the relevant numbers and further tailor site content based on those results.
Most importantly, prospective clients can be informed that, yes, a Flash Website can be indexed by search engines — without seriously increasing the project’s cost.
Disadvantages
In diverting searchbots to plain text content, it’s worth noting that this system relies on your keeping the stored list of bots up to date and accurate, which is possibly the solution’s biggest weakness.
Of course, this issue’s not a script-breaker, as any bot that slips through the filtering will simply find the HTML tags that hold the Flash movie. It’s up to the developer, though, to keep this list up to date — just as other Webmasters use a robot.txt file to exclude certain bots from visiting their Website.
Beware Cloaking
Cloaking, as defined by search engines, occurs when a method such as this is used to supply a bot with content that does not match the actual content in the Website.
From the Google Webmaster’s FAQ:
“The term “cloaking” is used to describe a Website that returns altered Webpages to search engines crawling the site. In other words, the Webserver is programmed to return different content to Google than it returns to regular users, usually in an attempt to distort search engine rankings. This can mislead users about what they’ll find when they click on a search result. To preserve the accuracy and quality of our search results, Google may permanently ban from our index any sites or site authors that engage in cloaking to distort their search rankings.”
Altavista has a statement, too, although it’s less clear in its definition of cloaking.
As can be seen, search engines frown upon cloaking and some actually ban Websites that use these techniques to gain a better search ranking. As such, I would strongly advise against supplying content that does not match what your human visitors see — it’s not worth getting banned from engines like Google that, apart from being one of the biggest search engines, also supplies search content to many other engines.
And anyway, you want your visitors to be met by the content they were searching for, right?
Future Directions
An interesting feature of Flash Player 6, and one that will hopefully be made more stable, is the Named Anchor. Named Anchors allow different frames and scenes in a movie to be targeted from a URL, without reloading the entire movie. The benefit of this capability is that, as each named keyframe is targeted, the URL changes to reflect its unique id.
This will allow visitors to capture the URL that relates to the content they’re currently viewing, by hitting the bookmark keystrokes. It will also permit users to navigate the site using a browser’s back and forward buttons.
Named Anchors are, at the present time, manually set on the timeline, which makes an ActionScript-based Flash site controlled from the URL pretty difficult to implement. This is something worth experimenting on, though, to better define its current capabilities and limits.
Conclusion
The code and its implementation discussed here are a step towards more accessible Flash content. This information can be used as a basis on which to build Flash Websites that are fully indexed by search engines, enhancing both Website traffic, and the awareness of Flash as a viable development tool.
As examined in this case study, flow on effects include:
- Bookmarking for direct return to content within the Flash Website
- Scalability and stability to handle content as it is added to the Website, and
- Client confidence that a Flash Website can be indexed without major development and cost overheads.
You, the reader, are more than welcome to add and subtract from this process and implementation. Bringing about improvements on the original scheme, and making it more stable, can only assist developers and designers in the promotion of their Flash work.
Use it, build on it, make it better!
Further Resources
Robert Penner’s back/forward button implementation using a frameset:
http://www.robertpenner.com/experiments/backbutton/backbutton.html
Jakob Nielsen’s classic Flash diatribe from 2000. Unfortunately, this is what remains in the collective consciousness whenever Flash and accessibility are mentioned together. As the URL states ‘use it’ to provide another point of view:
Flazoom.com Flash usability white paper:
http://www.flazoom.com/usability/usability_toc.shtml
Flash — 99% Good — A first aid manual for Flash Websites
http://www.flash99good.com/index_flash.html
James owns Web firm Quantum Serendipity, based in Sydney, Australia. He has worked on projects for Clairol Australia, Gatorade, and most recently, the 2003 Melbourne International Comedy Festival.