By Craig Buckler

My Website’s Broken: 5 Steps to Determine What’s Wrong

By Craig Buckler

We’ve all been there. Your wonderful website has been running successfully for months then — BAM — it disappears. Or, more often, certain features stop functioning. Despite your protests that nothing has changed, your client isn’t happy. Prepare yourself for a few frustrating hours of problem probing.

Step 1: Identify the Issue

This might sound obvious, but I’ve known many developers open their IDE and start hacking at random code. It’s more important to determine the issue than the cause at this stage. Is the site unavailable? Is a particular page or function failing? Is it limited to specific browsers?

Step 2: Test Resource Availability

Nine times out of ten, the problem will be caused by a connectivity issue at your end or the client’s. If you can’t access any other pages, it’s not surprising that your website has disappeared. That said, it’s not always obvious; certain IP ranges, countries or sections of the internet can become temporarily blocked.

Test your site from a variety of locations — a public proxy server will help identify whether it’s a local or global problem. If possible, examine the status of other sites running from the same server or web host.

One less obvious problem is disk space. If you’re running a busy site, server logs can rapidly use the available space even when your application’s storage requirements are low.

Remember that you might be using resources on other servers. This includes CDN-hosted files, database servers, or remotely-hosted APIs such as those for Google Maps, YouTube, Twitter, advertising services etc.

You should also check your server loads. A major traffic spike or Denial of Service attack will cause access problems.

Finally, is your domain registration valid and is the DNS server responding as it should?


Step 3: Identify What Changed

Once you’ve rejected connectivity, traffic, DNS and disk space, it’s time to determine what changed. 999 times out of 1,000 the problem will have been caused by an update.

You may not have touched the files but are you sure others haven’t? Check with everyone who has access but don’t necessarily believe them. Here’s a typical conversation you’ll encounter…

Client: My sites not working. What are you going to do about it?
You: I’ll fix it. Have you made any changes recently?
Client: No. It was like that when I got here.
…five hour’s frantic investigation…
You: You changed X, didn’t you?
Client: X? Oh yes, I changed X. I did that when I was fiddling with Y and Z.

Your application may not be directly to blame. Has your web host updated the OS, language runtime, database software or file permissions? While vendors attempt to ensure PHP, Ruby, Python, MySQL, PostgreSQL, etc. remain backward compatible, features will almost certainly change or break between editions.

Step 4: Reject the Edge Cases

Although rare, you should look for signs of cracking. Software such as WordPress, Joomla and OScommerce are obvious targets, however, changes are often subtle because the cracker wants to retain access. For example, you might discover that a file explorer add-on has been installed or phishing pages have appeared deep within the file structure.

Finally, you should never rule out hardware problems. Corrupt memory chips or disk sectors could be responsible for any number of bizarre issues. If possible, evaluate your application on a similar set-up or install a separate test version on the same server.

Step 5: Fix Your App

Once you have eliminated the impossible, whatever remains, however improbable, must be the truth. Perhaps your code isn’t as perfect as you thought…

Do you have any tips for diagnosing website or application problems? What was the most difficult issue you encountered?

  • Fabian Chesta

    You also can test if is down for everyone or just for you!

  • 99% of the time, the problem is caused by an update (for me). So every time I run into an error I check the error log and continue from there

  • Sites with logins or other ‘complex’ processes are also extremely prone to user error. I ran an ISP helpdesk back in the days when that still meant answering *any* question about using the Internet. When reviewing three months’ worth of calls to my team, I found that around 90% were due to forgotten or mistyped passwords, about 7% (I think – it’s a while back!) were other forms of user error, and only 3% related to network or user’s systems being faulty. That last 3% included malware issues, so I’d expect that to be higher now, but as far as I can tell, “user error” is still thriving. So I’d shift your ‘Step 3’ up to just before the disk space problems you mention (those are almost edge cases), and repeat Step 3 several times.

    When ‘not necessarily believing’ others, it helps to remember that much of memory is reconstructed retrospectively, so those reporting problems:
    a) Generally really *believe* what they’re saying – it is reality, to their perspective. Even if we know it can’t be true, it’s better to accept the description and seek a cause that could explain their perspective, than to just dismiss it;
    b) May find the idea that they may have done something wrong so unbelievable, insulting and/or embarrassing that they selectively (yet often honestly) forget any detail that they suspect may imply this;
    c) May have been distracted from or uninterested in the task that went wrong, or have such a mental block about tech (or be so inexperienced with it) that their short-term memory simply didn’t register the bits *we* know would’ve been important. If so, too much ‘reminding’ from us can cause them to fill in the blank spots in their memory with reconstructed events.

    Being aware of those points can help to interpret what really happened – I’m sure it’s old-hat to experienced techies, but I’ve met a lot who just dismiss confusing user reports as though they were purely imaginary.

    Oh, and http://supportdetails.net is really useful, too. :-)

Get the latest in Entrepreneur, once a week, for free.