We’ve all been there. Your wonderful website has been running successfully for months then — BAM — it disappears. Or, more often, certain features stop functioning. Despite your protests that nothing has changed, your client isn’t happy. Prepare yourself for a few frustrating hours of problem probing.
Step 1: Identify the Issue
This might sound obvious, but I’ve known many developers open their IDE and start hacking at random code. It’s more important to determine the issue than the cause at this stage. Is the site unavailable? Is a particular page or function failing? Is it limited to specific browsers?
Step 2: Test Resource Availability
Nine times out of ten, the problem will be caused by a connectivity issue at your end or the client’s. If you can’t access any other pages, it’s not surprising that your website has disappeared. That said, it’s not always obvious; certain IP ranges, countries or sections of the internet can become temporarily blocked.
Test your site from a variety of locations — a public proxy server will help identify whether it’s a local or global problem. If possible, examine the status of other sites running from the same server or web host.
One less obvious problem is disk space. If you’re running a busy site, server logs can rapidly use the available space even when your application’s storage requirements are low.
Remember that you might be using resources on other servers. This includes CDN-hosted files, database servers, or remotely-hosted APIs such as those for Google Maps, YouTube, Twitter, advertising services etc.
You should also check your server loads. A major traffic spike or Denial of Service attack will cause access problems.
Finally, is your domain registration valid and is the DNS server responding as it should?
Step 3: Identify What Changed
Once you’ve rejected connectivity, traffic, DNS and disk space, it’s time to determine what changed. 999 times out of 1,000 the problem will have been caused by an update.
You may not have touched the files but are you sure others haven’t? Check with everyone who has access but don’t necessarily believe them. Here’s a typical conversation you’ll encounter…
Client: My sites not working. What are you going to do about it?
You: I’ll fix it. Have you made any changes recently?
Client: No. It was like that when I got here.
…five hour’s frantic investigation…
You: You changed X, didn’t you?
Client: X? Oh yes, I changed X. I did that when I was fiddling with Y and Z.
Your application may not be directly to blame. Has your web host updated the OS, language runtime, database software or file permissions? While vendors attempt to ensure PHP, Ruby, Python, MySQL, PostgreSQL, etc. remain backward compatible, features will almost certainly change or break between editions.
Step 4: Reject the Edge Cases
Although rare, you should look for signs of cracking. Software such as WordPress, Joomla and OScommerce are obvious targets, however, changes are often subtle because the cracker wants to retain access. For example, you might discover that a file explorer add-on has been installed or phishing pages have appeared deep within the file structure.
Finally, you should never rule out hardware problems. Corrupt memory chips or disk sectors could be responsible for any number of bizarre issues. If possible, evaluate your application on a similar set-up or install a separate test version on the same server.
Step 5: Fix Your App
Once you have eliminated the impossible, whatever remains, however improbable, must be the truth. Perhaps your code isn’t as perfect as you thought…
Do you have any tips for diagnosing website or application problems? What was the most difficult issue you encountered?
Craig is a freelance UK web consultant who built his first page for IE2.0 in 1995. Since that time he's been advocating standards, accessibility, and best-practice HTML5 techniques. He's created enterprise specifications, websites and online applications for companies and organisations including the UK Parliament, the European Parliament, the Department of Energy & Climate Change, Microsoft, and more. He's written more than 1,000 articles for SitePoint and you can find him @craigbuckler.