Data scraping is the action of extracting large amounts of data from public-facing websites. Usually, software bots are used to extract the data as manually scraping large amounts of data is tedious and time-consuming. News content, ticket prices and quotes for insurance are just a few examples of the types of data that might be extracted. Often price comparison websites use bots to obtain data from data owners’ websites to republish on their own websites. Some companies use data scraping bots to check their competitors’ prices and slightly undercut their competitors at every price point, in real time, to gain a competitive advantage.

There are a few tricks that companies can use to protect their content from being scraped. If data has been scraped already, there are several avenues of recourse against data scrapers. Whilst no legislation bans all scraping, unauthorised use of publicly available web content can amount to a breach of contract, infringement of intellectual property rights, and a criminal offence in some circumstances.

Terms and Conditions

The first line of defence should be to ensure that your company has terms of use that explicitly prohibit screen scraping and ensuring those terms are adequately incorporated into a contract with the website user. Ensure that the terms are drafted clearly, in plain English, without typographical mistakes. Format the terms with readability and intelligibility in mind.

A sticking point in cases is often whether the website terms that prohibit screen scraping form a part of the contract between the website provider and the scraper. In Interfoto Picture Library v Stiletto Visual Programmes Ltd, the judge ruled that if a term is contained in an unsigned document (for example, a ticket or a notice), the terms will only form part of the contract if reasonable steps are taken to bring them to the consumer’s notice before the contract is made. In Thornton v Shoe Lane Parking, the Court ruled that terms referred to on the back of a ticket from an automatic ticket machine at a car park entrance, and on display in the car park, were not incorporated into a contract between the person parking and the car park company.

The best way to display website terms and conditions is to have the terms presented in clickwrap format before the user enters the website, as clickwrap terms are clearly accepted by the website user (as the user has to physically click ‘accept’ to view the website). Another option may be to have a box pop up requiring acceptance of the terms and conditions only when the user clicks onto more vital sections of the website, e.g. a user can click onto a homepage without having to accept any T&Cs, but before they are able to fill in their details for a quote the user has to click accept on the terms.

Clickwrap terms are a faff for customers and may ultimately drive genuine users away from your website, so you may choose to go for incorporating your terms through a clearly visible link (browse-wrap format). Courts have found that terms of use presented in this manner may not create an enforceable contract unless the website owner presents evidence that the user had actual or constructive knowledge of the terms. Though - even where browse-wrap terms may not form a part of a contract between the website user and the provider, some terms may be effective by notice – for example copyright or other IP licensing terms, and some liability terms.

If it comes to blows, a website owner can also argue that it is common practice to adopt terms of use and that web users should expect there to be some applicable terms (though the terms should still be clearly visible as per Scheps v Fine Art Logistic Ltd). Note that globally courts will generally hold a sophisticated user that builds a business using information from third-party sites to a higher standard than a non-business when deciding whether to enforce website terms of use.

In Ryanair v PR Aviation, a price-comparison website scraped flight information from Ryanair’s website in breach of Ryanair’s website terms, and the Court of Justice of the European Union (CJEU) looked at whether the Database Directive applied. The CJEU held that the Directive did not apply to databases not protected by either copyright or the database right. However, as a side point, the Court noted that Ryanair could enforce their click wrap terms and conditions against the screen scrapers. Despite Brexit, UK Courts are expected to pay attention to CJEU decisions and trends going forward.

Remedies for a breach of contract claim can include an injunction preventing further use of the infringed material and/or payment of damages derived from the scraped data. Proving damages may be tricky and will depend on each case. Bots which use scraped data for price comparison websites which direct traffic to the original website arguably do not cause any damage, whereas bots used to check and undercut competitors’ prices in real time could cause financial losses.

Intellectual Property Licence

Scraped data may comprise copyrighted work, and accordingly, misappropriation can amount to infringement. Insurance quotes, flight details and similar materials might be protected under the sui generis database right. Scraping material and going on to copy, rent, lend, or communicate a substantial part of that material for the public without permission from the intellectual property owner is infringement.

The sui generis database right protects the data stored in a database. It is an automatic, unregistered right that allows the owner to control specific uses of their database. The right arises in a database where there has been a substantial investment in obtaining, verifying, or presenting the contents of a database. A database right is infringed when all or a substantial part of a database is extracted or re-utilised without the consent of the owner. “Substantial” in this context relates to quantity and/or quality - therefore the repeated extraction and re-utilisation of insubstantial parts of a database may in fact constitute a substantial part. So, use of a web-harvester to repeatedly interrogate the same database may infringe database rights.

To bolster any claim against a web scraper, a website owner should ensure that their content is flagged as copyrighted in their terms and conditions. The terms should contain a licence to the IP for genuine customers which explicitly does not extend to screen scrapers.

Criminal Prosecution

The Computer Misuse Act 1990 (CMA) makes it a criminal offence to access a computer program or data without authorisation. The offence is wide-ranging and includes “hacking”. Though the CMA has yet to be tested in a screen scraping case, it may catch data scraping as the website owner does not authorise the type of access made by a scraper. A scraper must be aware that such access to the computer program or data is unauthorised to be guilty of an offence under the CMA. The scraper might argue that, by making the relevant data available to the public via its website, the owner has granted a licence for the public to access all the data. However, this argument is not likely to be successful as any such implied consent would not be deemed to extend to scraping. Website terms which explicitly define and prohibit screen scraping should be instituted to make clear that there is no implied licence for screen scrapers. Criminal behaviour which falls foul of the CMA can lead to fines or even imprisonment. The innocent party can report the violation to the police and even follow up by pursuing private criminal prosecution. Private prosecution allows a private individual, or entity that is not acting on behalf of the police or other prosecuting authority, to bring a criminal case to court.

Other options and concluding thoughts

Terms and conditions, restrictive licences and criminal prosecution are just three weapons available to companies looking for recourse against data scrapers. If personal data is extracted, then there may be claims in relation to GDPR. Some industries have industry-specific legislation about scraping e.g. open banking guidelines are published by the Open Banking Implementation Entity. Companies can also employ technical safeguards, such as those familiar tick boxes asking customers to confirm that they are not robots.

To better protect themselves, companies should carefully draft their website terms to restrict data scraping and provide a licence for genuine users to the exclusion of scrapers. Website users should ensure their website terms of use are in clickwrap format if possible. This is a complex area of law; businesses trying to protect themselves from web harvesting or take legal action against the harvesters could benefit from specific legal advice on the topic.