The Rise of Web Bots and Fall in Human TrafficBy Craig Buckler
A couple of years ago I reported that 51% of all website traffic was non-human. The study, undertaken by Incapsula, has been updated. We have become the minority: bot traffic has reached 61.5%. I say “we”; there’s only a 38.5% chance you’re human.
The report data was gathered from 20,000 customers who use Incapsula’s security services. These are companies who are especially security-conscious or have been on the receiving end of nasty cyber attacks. They’re unlikely to represent the average website but the relative growth in bot traffic should be applicable.
The distribution indicates:
- 38.5% is biological entities. Mostly humans, a few cats and assorted unclassified creatures.
- 31.0% is search engine and other indexing bots (a rise of 55%).
- 5.0% is content scrapers (no change). If you’re reading this anywhere other than SitePoint.com, you’re viewing a lazy copy of the original page. It won’t be as lovely an experience!
- 4.5% is hacking tools (down 10%). Typically, this is malware, website attacks, etc.
- 0.5% is spammer traffic (down 75%). That’s bots which post phishing or irritating content to blogs. Any negative comments below will certainly be from non-humans.
- 20.5% is other impersonators (up 8%). These are denial of service attacks and marketing intelligence gathering.
The overall conclusion: bot traffic has increased by 21% in 18 months. However, the majority of this growth has come from cuddly good bots who have our best interests at heart (or should that be processor?)
A degree of cynicism is healthy. Incapsula is a security company; a rise in scare mongering has a direct correlation with their bottom line. That said, many companies are particularly lax about security until it’s too late. No system is ever 100% secure but the majority are caught by basic SQL injections or social engineering. Never underestimate the ingenuity of crackers … or the naivety of your boss.
Why Your Website Visitors are Falling
The rise of indexing bots is more interesting. We’re approaching a tipping point where the information you want won’t necessarily be obtained from the website where it originated. It’s already happening…
- If you need company contact details, you enter the name in a search engine and it appears along with a map and directions.
- If you want product information, you enter its name and can instantly view the specifications, prices and reviews.
- You want to find the closest Indian restaurant; it magically appears on a map on your smartphone.
At no point did you visit the official company website. The data is scraped and repackaged for easier consumption on an alternative device such as a smart phone, watch or Google glasses.
This type of activity has been occurring for many years but it’s fairly simplistic and you can search for one or two inter-related factors. The real challenge will be non-explicit joined-up data queries, e.g. “find a heating-specialist who has worked for my neighbors” or “find all web design agencies in New York with a red logo”. The search engine or app could refine data to a handful of relevant results rather than thousands of website links. The rise in web bot indexing activity will inevitably intensify.
Of course, a business website will remain essential — but having one which can feed the bots is increasingly important. Direct human traffic to your website may even fall but bot-based sales leads will rise. If you’re not doing so already, it’s time to invest in machine-readable data exposure, e.g.
- structured data formats from Schema.org
- item-specific data feeds such as products and services
- discoverable, URL-based REST APIs
- RSS and sitemap feeds.
The bots may be working for us, but they’re rapidly becoming our masters.