Content scraping (also referred to as web scraping or data scraping) is nothing but lifting off unique/original content from other websites and publishing it elsewhere. This technique is illegal as it’s done without the consent of the original source or author. Content scrapers typically copy the entire content and pass it off as their own content.
Content scraping takes a toll on the website that has invested the time, money and resources to create original content, as their SEO and web authority ranks are knocked off. According to Pi Datametrics, web scrapers can easily outrank1 you on Google.
The following are the typical content targeted by illegal scrapers, but not limited to these:
Thought leadership articles and blogs
Comprehensive product reviews
Fresh news articles and Op-ed pieces
Technical research publications
Fresh listings on classified directories, job portals and property websites
Financial information and research publications
Product catalog and pricing information on eCommerce websites
Content scraping, on a basic level, can be accomplished by manual copy and paste. More sophisticated techniques involve bots that are used to crawl websites and copy thousands of pages within a matter of seconds.
Content scraping is a commonly practiced method in the online publishing companies that rely on ad revenue to fuel their websites. Third-party scrapers are able to generate heavy traffic by crawling and copying high quality, keyword dense content from other websites. Bloggers and media publishers are usually targeted to get fresh content for their website.
Search engines like Google, Bing and Yahoo do not yet have a comprehensive method to distinguish the unique content from scraped content if the scraping had happened in a very short span of time.
Copying and publishing of copyrighted content is a punishable offense. However, even after defining copyright laws, and stressing on the terms & conditions of using the website, original content is always under threat as the hackers behind the scrapers are not easily detected and can quickly shift their content to a different website. Loosely defined laws also don’t help in securing website content2.
In 2003, American Airlines filed a lawsuit against a online travel portal FareChase for scraping their pricing, seating and flight schedule information illegally. The lawsuit ended a year later with an undisclosed settlement.
Scraping is a headache for business owners producing rich content on the web. Having said that, websites can be protected with a robust real-time bot prevention solution.