Almost all online businesses with a large online presence and listings are targets of systematic data theft – commonly known as scraping, Web harvesting or Web data extraction. During these attacks scrapers systematically steal large amounts of information from the company’s web site, in clear breach of the terms and conditions, and use it for example to boost a competing business.
Data Scraping or Data Harvesting is theft of property from websites and has been going on as long as companies have been publishing data and images on the web. Today it is done on an industrial scale as some entities believe that it is easier to steal data than to create it. There dozens of commercially available packages that offer tools and anonymity to Scrape.
A system called ASSASSIN, developed by Sentor, can detect and block scraping and data theft around the clock, in real time. The Assassin Anti Scraping system is an expert system that analyses traffic and requests to websites in an unobtrusive way. By analysing usages and traffic patterns it scores requests and concludes whether they are made by human or web robots. It can detect Scrapers using anonymous proxy services or large amounts of open proxy servers to avoid detection. The system retains all forensic data which gives the choice to apply blocks, warn off scrapers or prosecute perpetrators. The Yell.com white paper on ASSASSIN case study is available at http://www.sentor.se/en/ASSASSIN_case_study.pdf