Web Bots vs Humans: We’re Losing

Contributing Editor

51% of all web site traffic is non-human. If you’re reading this, you’re in the minority. Unless you’re a machine (check your ports — if they’re warm and moist, you’re probably OK).

The study was undertaken by Incapsula, a website security company. While we haven’t quite reached the point where Skynet becomes self-aware and dispatches Terminators to wipe us out, it does appear that the web would continue to operate happily without us.

Data was collected from a sample of 1,000 websites which use Incapsula services. The report concluded that web traffic is generated from:

  • 49%: people browsing the web
  • 20%: search engines bots indexing pages
  • 19%: spy-bots collecting competitive intelligence
  • 5%: website scrapers
  • 5%: automated hacking tools searching vulnerabilities
  • 2%: automated comment spammers

It’s easy to be cynical about these reports, especially when they’re conducted by a security company which could enjoy commercial benefits from scare-mongering. That said, it does indicate a significant volume of hacking activity even if the report is somewhat skewed or exaggerated.

If you or your clients aren’t concerned about security, perhaps it’s time to evaluate that policy. If we assume Incapsula report is over-estimated by a factor of 300%, malicious activity will still account for one in ten website requests. That’s equivalent to the average number of all IE6, IE7, Firefox 3.x and Opera users combined. Nasty.

Humans may be losing the web war, but at least we can win a few battles.

Do you believe Incapsula’s report? How does it compare with your website statistics? Have you successfully defeated a major hacking attempt?

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Tsukasa

    Who paid for this study? Do you trust Churchill’s statistic data? (“The first lesson that you must learn is that, when I call for statistics about the rate of infant mortality, what I want is proof that fewer babies died when I was Prime Minister than when anyone else was Prime Minister. That is a political statistic.”)

  • http://www.nichesense.com Anshul

    Surprised to see comment spammer at just 2%:) We deal with a lot of automated comment spam on our blog on a daily basis. We deploy plugins like Akismet and WP Lockup on all of our wordpress sites from the start to protect against spammers and potential hackers.

    • Chad Garrett

      A spam bot can do a google search for wordpress-like phrases or URL’s. Since a lot of websites aren’t WordPress blogs, they lower the average. Sure, comment spammer traffic as a percentage of WordPress site visits is probably higher. They are giving an overall Internet number.

    • http://www.kabeerkhan.com Kabeer Khab

      hmmmm…..is there any plugin available for facebook pages??

  • http://www.webspacecreations.com Matthew

    Yes, I suspect these numbers are pretty accurate. A lot of analytics packages just don’t provide the details for website owners to see this. Sites like Project Honey Pot will provide insights into your visitors’ overall behavior patterns. Determining the reputation of a visitor is going to need to be automated to deal with an increasing number of bots that are launching real-time threats.

  • Alan

    “If we assume Incapsula report is over-estimated by a factor of 300%”

    300 percent?! What evidence is there that there is any over-estimation whatsoever? If none, then these comments are uncalled for.

    • http://www.optimalworks.net/ Craig Buckler

      There’s no evidence whatsoever. We don’t have the data or know Incapsula’s categorization criteria. That’s my point: even if the report’s grossly inaccurate, there’s still plenty of malicious activity going on.

  • Matthew P

    I would put a lot more faith in this data if some other, more neutral studies could corroborate this data – these results are quite staggering for one thing, and when the data just happens to serve the interests of the publisher so neatly it’s a little too much to swallow.

    http://www.incapsula.com/the-incapsula-blog/blog-2012/114-what-google-doesnt-show-you-31-of-website-traffic-can-harm-your-business

    Looking at the original blog from Incapsula doesn’t make it more believable for me either… Something about it seems off, it’s kind of hard to describe. There’s not much in the way of actual data, just opinions, some flashy graphics and a few broad percentages with no explanation of where they came from or how we can be assured of their accuracy. I guess even if the results are accurate they’d still want to show them off like that… It just seems to me, if they went to all the trouble to accurately measure the data they’d want to put some effort into backing it up by going into more detail about their survey process and data-set. I can understand they can’t give it all away, but they haven’t given up anything – “This is what we found, just trust us” not very scientific or believable.

  • http://www.spinxwebdesign.com/ Website development

    This statistics are really helpful but i got surprised when saw “2%: automated comment spammers”, i think Incapsula has to confirm this figure again.

    • http://cssbased.com cssbased

      Yes, Incapsula also should consider you, I see you spamming everywhere!

  • Mikl

    There’s an obvious problem with these figures:

    The report is based on data collected from Incapsula’s clients. Incapsual is a “website security company”. Doesn’t it seem likely that Incapsula’s clients are going to be businesses that have above-average problems with security (that being the reason they are using Incapsula’s services)?

    I’m not saying we are wrong to worry about these issues, but I do question the usefulness of these particular figures.

  • http://www.xoogu.com/ Dave

    Judging by the hits on wp-login.php recorded for my wordpress sites, I find these stats quite believable.

  • Alex

    Whoops, started checking my ports before I realized you were kidding..

  • StatBot

    I just surveyed 93.7% of all the sites in the internet and 37% say these statistics are at least 200%-500% incorrect either way, 14% said they believed the figures were accurate but don’t take account of global warming, while 23% couldn’t respond to the survey as there wasn’t a survey function to be called within their main script. The remainder were too busy either sending spam, stuck in a youtube loop or writing daft comments to worry about stats that only relate to one company and are in no way representative of the global picture.

  • http://www.datanlogic.com Posicionamiento Web Cali

    Sorprendente. 2% de las visitas es mucho.

  • http://www.design-streams.com Ann

    If I use a search engine, like Google, to find a website then that would involve an automated search bot. I only check a few of the websites that are returned as results of such a search. I use such search engines prior to 80% of my website visits. Perhaps the statistics are true.

    • http://www.vantagegaming.net/ Ben Vail

      That’s not how Search Engines works; They index sites on a periodical basis and keep their own database of what they find. It’s this database they search when you perform a search, :)

      Are scrappers or “spy-bots” necessarily malicious?

  • http://deadlysyntax.com Jaap

    While I did giggle at the port inspection gag and I find the information interesting, what I don’t like about this article is the implied ‘war’ that we humans are supposedly engaged in against machines. I didn’t realise we were competiting for page-view numbers. I fundamentally dislike the Californication of artificial intelligence and robots. Sure, Terminator was a cool movie, but those kind of dramatisations put the wrong idea in peoples heads about the increasing intelligence of our machines.

    • http://www.optimalworks.net/ Craig Buckler

      Thanks Jaap. As you gathered, the article wasn’t particularly serious. What you also need to remember is that humans create and launch these bots for good reasons (search engine indexing) or bad (system cracking). We’re not at war with technology and SitePoint users would understand that — it’s a site for developers.

  • http://www.creativewebguru.com/ Teddi Deppner

    Ha ha! You had me at “warm and moist”!

    While I agree it would be nice to have more studies to compare and more objective organizations doing the research, I am not surprised by nor rejecting the data. Given the things I’ve seen and experiences I’ve had over the past 13+ years publishing websites and what I’ve seen with WordPress sites over the past several years, this seems completely in line with my experience.

    Thanks for raising awareness about this. Malicious activity seems to be on the rise, and people with websites need to start thinking about security, if they haven’t already.

  • Scott

    I believe it. If you listen to the Security Now podcast with Steve Gibson and Leo Laporte, they talk about this kind of stuff all the time. Steve calls it “Internet Background Radiation” or IBR. There are so many infected PC’s out there that are controlled by malicious hackers (bot-herders) – thats’ how they perform DDOS attacks. I can only imagine all these infected computers would be scanning the web in their downtime, and if one PC is scanning thousands of websites say every 5 minutes, that trumps the 1 or 2 websites I might visit and read in the same 5 minutes by a huge magnitude. Google the Security Now podcast “Internet Warfare” where they go into detail on this subject