There is no greater disappointment than failing to find what you’re looking for. From time to time, surfing the Web will lead us to dead ends that typically sport the titles "Error 404", "Not Found", or "The document could not be found".
The code 404, as you may already know, corresponds to "File Not Found", and is probably the most famous of all the status codes defined in the Hypertext Transfer Protocol (HTTP) specification. These codes correspond to different responses users can receive when they request a document from a Web server. In this case, whenever a file can’t be accessed by a user, the Web server will generate a 404 error.
Since the default mechanisms can’t provide any relevant information to our visitors, showing customized error messages is essential. However, before we delve into details about the usability aspects of their design, let’s explore how an Error 404 is produced and how can we avoid them before they occur.
Document was Not Found
Let’s examine all the possible scenarios for a response of this nature:
- Changes to Document Properties
- Mistyping
- Truly Nonexistent Documents
index.html
– Before the appearance of all the application servers out there, index.html was the default initial page for any Website. However, if you run PHP on your server, you’re most probably sticking to anindex.php
document as the default; the same might happen if you use ColdFusion (index.cfm
) or Microsoft ASP (default.asp
).favicon.ico
– Whenever a person visits our site and adds it to his/her list of Favorites or Bookmarks, the browser attempts to find an icon that distinguishes your page in the list. If none was specified in the document, the browser will try to get afavicon.ico
file from the document’s root folder. However, there is a quick fix: all you have to do is add the following line to your document, between the<head>
tags:robots.txt
– This is a text file that usually resides in the root directory of every Website and contains a series of instructions that indicate to "spiders" or "robots" coming from search engines and other indexing services which directories to scan and which ones to avoid. If therobots.txt
file can’t be found, all spiders will index everything in their path, which may be what you want, but will leave a 404 error behind. Quick fix: put a blankrobots.txt
in your root folder; it won’t do any harm.Ghost files
– If we use a second-hand domain name, we might receive some hits corresponding to the site that used to sit on that domain. In the same fashion, our hosting provider might assign us a server with an IP address that had previously been used directly to access some other Website. These 404 errors are easy to identify, as they should yield references to documents and directories that are completely unrelated to our site’s structure.
Reorganizations are common in Websites of all sizes: documents are renamed, deleted or moved within the site structure. Even though Webmasters might ensure that references to moved or deleted documents do not exist within the site, there’s nothing they can do about incorrect references from external sites, such as search engines, caching services, link exchanges, etc.
Let’s also consider what may happen after a server migration — especially one that involves changing operating systems. Although the use of lowercase letters to name Web documents is encouraged and considered a best practice, this rule isn’t often observed in work environments that make extensive or exclusive use of Microsoft Windows operating systems (in which file names are not case-sensitive).
In these cases, the best approach is to set redirections for the files being moved, renamed or deleted as part of the operations. Although configuring a Web server to provide customized error responses is beyond the scope of this article, you can find more information on it in the following documents:
If our Websites have an important role in our business activities, their URLs might have been printed in our business cards and any other stationery. If so, people will be typing our Web address into the browser’s address bar and — let’s face it — not everybody is an expert typist. Somebody’s fingers can easily slip and hit the wrong key.
Another issue associated to involuntary misspellings occurs when somebody makes a reference to a document on our site, but mistypes the URL in the link. You could always try to get a hold of the people maintaining the referring site and ask them to make the change (good luck with that!), or you can play it safe and solve the problem on your side, especially if that reference is important to you.
In these cases, you could set redirections for the most common mistakes. However, Apache offers a more elegant way out: spell checking. The Checkspelling
directive enables the correction of minor spelling mistakes within specific directories of your site. For more information on configuring Checkspelling
, refer to Setting Checkspelling on Apache.
Some documents are, in fact, not present in our Websites, but are assumed to be part of our collection for one reason or another. In some special cases, very particular files can be requested from our Web server, because most of the time they exist elsewhere. In most cases, redirecting is the way to go. Here’s a list of the document names that are most commonly presumed to exist on any site:
<link src="/images/mycompany.ico" rel="shortcut icon" />
Not only will this save you some 404 errors, but it’s also a good idea to make links to our site distinctive.
If none of these approaches can prevent an Error 404 from happening, then nothing will. This is when the design of a 404 page becomes crucial to the delivery of a good user experience.
Designing Usable 404 Pages
There’s no set of rules or recipes that will make your Error pages perfect. Instead, I would like to present a few recommendations that will help you deliver a better user experience, and take action when required. Remember that some of the suggestions that follow may need to be adjusted to your own reality, taking into consideration the demographic information you may have about your visitors.
- Do Not Redirect Without Permission
- Do Not Call it "Error 404"
- Do Not Assume it’s the Visitor’s Fault
- Do Offer a Site Map
- Do Make this Page Stand Out
- Do Offer a Search Form Whenever Possible
- Do Fix those Broken Links
Redirecting people to the homepage — or any other place, for that matter — without an explanation is simply impolite, and will only confuse your visitors. Such a simple implementation may seem like a good idea, but it’s really only a means to avoid the problem — not a real solution.
We all know what an "Error 404" stands for, but does anybody else? Let’s not forget that the Internet is still largely uncharted territory for some people. We don’t want Aunty Mary becoming lost if she follows a broken link offering a cookie recipe, do we? Of course not… This brings me to the next point.
With the "Error 404" out of the way, we should try offer some kind of explanation for what the user is experiencing. At this point, we have no way to know what went wrong, so a generic, friendly message is in order; something like:
"Ooops! We couldn’t find the page you’re looking for."
Or:
"Your document could not be found. We apologize for any inconvenience."
The trick is not to assume anything if you’re not sure why the error’s occurred. Write the message in language that exhibits a level of familiarity that’s appropriate for your site. Notice that we use present tense in referring to the issue (i.e. "you are" instead of "you were"), which makes the visitor feel that the issue isn’t over, and that you are working on it. This message, in combination with some of the following resources will make sure the user doesn’t leave your site — at least, not immediately.
Many Usability experts say that site maps have no place in a Website and that, if a site really needs to be mapped in order for users to get around, a design problem exists. I think that statement is questionable, but if there’s a good place for a site map, it’s at your 404 page. Think about it: this document would appear in the only place at which no real content was served. It’s the best place to offer a global view of the content structure of your site, as it provides hints to your visitors and will help them find what they’re looking for.
Although we may want to make our 404 page look as if it’s part of our collection, and have it share any branding we’ve used across the site as a whole, we need to let the visitor know that something unexpected is going on, at first glance. To accomplish both goals, we need to give the error page a minimalist design, based on the site’s global look and feel. The main message of the page should also be clear: give it a bigger font size than any other title on the site’s style sheet, for instance.
If you have an internal search engine, putting a search form in your 404 page is a no-brainer. But, what about taking it a step further? Depending on the server-side technology you use, you might be able to trigger an automatic search based on any data you can extract from the visitor’s query.
So far, we’ve managed to provide the visitor with helpful information and tools. Now, it’s time to take corrective measures.
When a reference to a missing document is followed by a user, there are two pieces of information we can gather: the alleged location of the missing file and the Referrer Web address. To illustrate this situation, let’s consider the following:
- www.mysite.com/path/to/missing.html is the document being incorrectly referenced
- www.othersite.com/referrer.html is the location at which the incorrect reference exists
We can easily split any of these URLs using the first forward slash (‘/’) to separate them in two parts: the host name and the location of the file within that host. By comparing these pieces of information, we can expect one of the following scenarios to have occurred:
A. User has an outdated bookmark or has mistyped the URL
If we can’t find a valid Referrer, it is safe to assume that the missing document is being called from someone’s list of Favorites or Bookmarks (within the browsers preferences), or that the visitor has made a mistake when writing the URL.
In this case, our generic message will be good enough, but let’s not forget to notify the Webmaster about this incident: an automated message can be sent to the Website administrator, containing the captured reference to the local document. If it is in fact an outdated bookmark (i.e. the file exists in another location within our site), a redirection should be put in place.
B. There’s a broken link in the site!
What if we find a Referrer? The first thing we have to do is compare its host name to our own: if there’s a match, we’re in trouble.
In this case, we have to admit the fault is ours, and deliver the appropriate message to the visitor. Next, we have to send an urgent message to the Webmaster, containing both the captured location of the missing document and the referring local URL.
C. There’s an incorrect reference from an external site
If the comparison between the hosts from the referring site and ours yields a negative result, then we’ve caught an external broken link.
A special case in this category would be outdated links from search engines, which bring with them extra data. In this case, we can use the search terms that made our site appear in the search engine list of results and try to deliver customized information using our own search capabilities.
Typically, a search engine query URL looks something like this:
www.searchengine.com/action?q=usable+error+pages&lang=en&safe=on
Here, 'q'
is a variable containing the search terms!
To take advantage of this information, all we have to do is determine if the referring page in fact belongs to a search engine, isolate the search terms and use them at our discretion.
The first part is easy, as we have already separated the host name from the rest of the referring URL; all that’s left is to find out whether or not it contains the name of a search engine we know. Probably the best way to do this is to use a function such as indexOf (which is available in many programming and scripting languages) and compare the host name against an array of terms such as that shown below:
List = ["google", "yahoo", "msn", "altavista", "hotbot", "lycos"]
The next step is to identify the variable that carries the search terms. In the example above, you can see that a search query contains not only the search terms but also extra information that’s relevant only to particular search engines. Google, MSN and Altavista, for example, use the variable 'q'
, while Hotbot and Lycos use a more verbose 'query'
; Yahoo likes 'p'
better for some reason.
As you might have notice already, when the search query contains two or more words, they are put together using a plus sign ('+'
). This makes it easy for us to split that string into words with which we can feed our internal search engine.
As you can see, there’s plenty of information we can show in place of a default 404 Error. Mixing the resources available to us with a little creativity lets our visitors know we care. We’re also doing ourselves a favor by not letting them slip away — we’re keeping them interested. It’s a great way to round out your overall site experience and one that should not be ignored.