On 24 August 2023, 12 international data protection and privacy regulators issued a joint statement (Statement) on their “global expectations of social media platforms and other sites to safeguard against unlawful data scraping”.

The Statement is a call to action for online platforms and websites, particularly social media companies (SMCs), to address the rise of unlawful data scraping. It sets out expected standards to ensure the protection of personal data and confirms that data protection rules apply to data scraping.

While no European data protection regulator is a signatory to the Statement, it is a significant regulatory development for such an array of non-European regulators to come together and issue this joint message. It is a rare occurrence and demonstrates that data scraping, including its impact on data protection rules, is being considered at an international level. This is undoubtedly an area of increased regulatory focus as global policymakers seek to regulate artificial intelligence (AI) technologies.

What is Data Scraping?

Data scraping is a technique enabled by software to automatically pull or “scrape” publicly available information from online sources, such as a website or an online platform. It is often invisible and can occur without a website or platform knowing that data have been scraped, and can be contrary to their terms of use and privacy notices.

From a commercial standpoint, data scraping can have beneficial uses, such as the automated collection of real-time information and conducting targeted research cost-effectively. Recently, data scraping has been in the spotlight as a source of training data for developing non-proprietary AI systems (e.g. large language models).

Implications of Data Scraping on Data Protection

Data scraping gives rise to data protection concerns when personal data are scraped and harvested without a legal basis or knowledge of the individuals to whom such data relates. Data scraping in this manner is likely a breach of data protection rules (such as the General Data Protection Regulation (GDPR)) and leads to unlawful processing of personal data (e.g. unsolicited direct marketing, identity theft, profiling, monitoring and personal data breaches).

Statement: Call to Action

There are five key points in the Statement’s call to action directed at SMCs and other online businesses to safeguard against unlawful data scraping:

  1. Publicly accessible personal data is still subject to data protection and privacy laws in most jurisdictions: simply because information is available publicly does not necessarily mean that it is immune from the protections of privacy and data protection laws. Individuals and businesses that scrape data from websites or online platforms and the businesses that host such data must comply with data protection and privacy laws.
  2. SMCs and websites that host publicly accessible personal data have obligations to protect personal data on their platforms: the Statement addresses SMCs specifically as the entities responsible for protecting user personal data from unlawful data scraping. The Statement recommends the implementation of technical and organisational controls to mitigate risk and confirms that SMCs should consider legal action where data scraping is suspected or confirmed.
  3. Individuals can take steps to protect their personal data from data scraping: the Statement outlines the various steps individuals can take to safeguard their personal data. These include reading website privacy notices, limiting the information they post online and understanding/managing the privacy settings on their accounts.
  4. SMCs have a role in enabling users to engage with their services in a privacy-protective manner: the Statement provides that SMCs should proactively support users to make informed decisions about how they use an online platform and what personal data they share online. This should also involve increasing user awareness of privacy settings.
  5. Mass data scraping incidents can be reportable data breaches in many jurisdictions: quite possibly, the most onerous part of the Statement is that data scraping may be a reportable personal data breach. The Statement provides that SMCs and other businesses that allow personal data to be scraped should notify the relevant authority and individuals affected.

The Statement was shared directly with leading SMCs (e.g. YouTube, TikTok, LinkedIn, X, Facebook and Instagram) with a request for feedback by 24 September 2023. The requested feedback should demonstrate how the SMCs comply with the expectations in the Statement. Responses will be shared amongst the signatories and may be published.

Impact of the Statement on European Businesses

The European Data Protection Board (EDPB) and Irish Data Protection Commission (DPC) were not signatories of the Statement, and, as such, it has no legal impact on European businesses. However, data scraping has been on the radar of the DPC and EDPB for some time:

  • In November 2022, the DPC imposed a fine of €265 million against Meta Ireland for personal data breaches resulting from data scraping practices in breach of the GDPR. The DPC found that Meta had failed to implement appropriate measures to prevent data scraping of user data, which presented severe risks to users (e.g. fraud, spamming and impersonation by bad actors).
  • There are eagerly awaited guidelines from the EDPB about the interplay between the Artificial Intelligence Act (AI Act) and the GDPR, which may include direction on data scraping, given its significance in the training of AI systems.

What Should Businesses Do?

The Statement is a significant (albeit non-binding) development in the current landscape of global regulators seeking to regulate digital content and AI technologies with consistency and certainty on a cross-border basis. While the Statement clarifies that data protection rules apply to data scraping where personal data are concerned, it does not clarify lawful versus unlawful data scraping.

The Statement outlines best practices for online platforms and websites to protect against unlawful data scraping; these include:

  1. Designing systems with privacy-by-default features by implementing measures such as CAPTCHAs, rate limiting and blocking any IP addresses with suspicious activity to help identify “bots” accessing a website and conducting data scraping;
  2. Monitoring how frequently users visit a platform/website and performing data scraping checks;
  3. If considering the deployment of an AI system, conducting due diligence on the training data, and ensuring there is a lawful basis for the system’s data collection;
  4. Creating a team dedicated to identifying and implementing mechanisms to protect against data scraping;
  5. Where data scraping is suspected, sending cease and desist letters requiring the deletion of the information and other legal action considered appropriate;
  6. Where a data scraping incident constitutes a personal data breach, notifying the relevant data protection authority and any affected individuals.

For now, and until further legislative developments are fully effective (like the AI Act and Digital Services Act), the GDPR continues to regulate data scraping where personal data are concerned in Europe. Nonetheless, the recommended best practices in the Statement are helpful and should be considered by all businesses operating websites or platforms where user personal data may be scraped.