In recent years, malicious bots have become extremely sophisticated in their ability to masquerade as human visitors and evade conventional Web security measures. These bad bots will continue to pose ever-growing threats to websites, applications, and APIs, and will always remain in focus for webmasters looking to mitigate their impact on their organizations.
However, in certain circumstances — such as when good bot traffic starts approaching or even surpassing the limits of server and bandwidth capacity, even good bots can and do cause harm, just like bad bots. This is why a holistic approach that manages bots (both good and bad) is the most effective one, and allows webmasters and security chiefs to take specific actions on each type based on organizational needs and other factors.
Types of Good Bots
Good bots encompass a wide range of functions and capabilities. Let’s take a brief look at some of the major classes of good bots and what they do.
Search Engine Crawlers
Bots such as Googlebot, Bingbot, and Baidu Spider crawl web pages to index them for search engines such as Google and Bing. Website administrators can specify non-binding rules in their ‘robots.txt’ file for crawlers to follow while indexing web pages, such as their crawl rates and pages or sections that they should not index.
Partner bots provide essential services and information to websites and their customers. This category includes bots run by vendors which provide transactional/ CRM/ ERP services, geo-location, inventory and background checks, and other business-related services. Alexa, Slackbot, and IBM Watson are examples of this type of bot.
Social Network Bots
Social networks operate bots to provide visibility to their clients’ websites, drive engagement on their platforms, and some can even carry out chat conversations with users to provide information and services. Examples of this type include Facebook Bot, Pinterest Bot, and Snapchat.
These bots, such as Pingdom, AlertBot, and StatusCake, are used to monitor the uptime and system health of websites by periodically checking and reporting on page load times, downtime duration, and so on.
Backlink Checker Bots
Backlink checkers analyze the inbound URLs on a website to provide marketers and SEO specialists with insights to help them optimize their content and campaigns.These include bots such as SEMRushBot, UAS Link Checker, and AhrefsBot.
Aggregator/ Feed Fetcher Bots
These bots, such as Google Feedfetcher, Superfeedr, and Feedly, collate information from websites and provide users with customized news, alerts, and other desired content.
Breakdown of Overall Traffic
First, let’s look at the data we gathered on the overall percentage of traffic we observed in the second half (H2) of 2018 on four industries — Online Travel Agencies, E-Commerce, Classifieds, and Media & Publishing. Humans comprised nearly 74% of traffic, while good bots totalled nearly 17%, and bad bots nearly 10%.
Crawler Traffic Distribution
While good bot statistics can significantly vary across time periods and industries, our observation found that search engine crawlers comprise about 55% of all good bot traffic, with other good bots making up the rest.
Google bots made up nearly 68% of search engine bots across our client base; Bing bots came in at nearly 26%, and bots from Yandex, Yahoo, and Baidu made up the rest
Good Bot Distribution By Industry
An industry-wise breakdown of good bot traffic shows us that Classifieds & Marketplaces business get the highest level of Aggregator bot traffic, followed by Online Travel Agencies (OTAs). Partner bots are mostly seen on E-Commerce sites and OTAs. E-Commerce businesses also attract the greatest numbers of Social Network bots, followed by Media & Publishing sites, while OTAs come in last with a tiny 0.1% of their good bot traffic consisting of Social Network bots.
How To Manage Good Bots
It is often advisable to block good bots that are irrelevant or unnecessary for your business. Geographical restrictions allow webmasters to block every class of bot from certain countries. Let’s say, for example, that your business does not operate in Russia. Blocking Russian bots including search engine crawlers such as Yandex makes sense if your SEO and market strategies do not include the Russian market.
Detecting Spoofed Search Engine Crawlers
Malicious entities often deploy bots that masquerade as crawlers such as the Googlebot to evade basic Web security systems and carry out content scraping and competitive intelligence gathering. Measures such as doing a reverse DNS lookup, or comparison of the behavior of suspected spoofed crawlers with the behavior of real crawlers, can help counter such attack strategies. A dedicated Bot Management solution will generally feature a list of bad bot signatures from across its client base to secure websites and apps against spoofed crawlers and various other types of attacks.
Blocking Good Bots Can Also Have a Negative Impact
It’s imperative for webmasters and security specialists to keep an eye on the types of bots being blocked. Non-specialized security systems or in-house anti-bot solutions may sometimes block good bots such as search engine crawlers and lead to a negative impact on SEO. Along the same lines, blocking partner bots or social network bots could also lead to undesirable results for your business and your customers. Make sure you whitelist the essential good bots so that they can function unhindered!
Adopt an Industry-Specific Approach
We recommend that webmasters, security experts, and marketers take an industry-specific approach when devising bot management strategies. Depending on your business objectives and other factors specific to your organization, it’s generally a good idea to prioritize business-critical good bots such as partner bots or search engine crawlers over other types of bots.
Use a Dedicated Bot Management Solution
A dedicated solution can not only provide comprehensive insights into every type of bot traffic, but also provides insightful analytics for marketers to optimize their strategies and forecast changing market trends.
Management of bots (both good and bad) cannot be an isolated activity in today’s complex and interconnected Web and app ecosystem. With the currently intensifying focus on bot management for websites, apps, and APIs, we see virtually every leading organization going beyond basic security systems and looking at bot management in a holistic way and as a keystone component in their overall security suite.
As originally published in Security Boulevard