The 2018 ICO investigation against Cambridge Analytica's practices on Facebook: An international wake-up call

It has been almost two weeks since the practices of Cambridge Analytica broke the news, further to the launch of an official investigation against the British data analytics firm on March 17th by the UK data protection authority (ICO). In short, the UK authorities allege that for several years Cambridge Analytica has accessed and processed the personal data of over 50 million profiles to the direct benefit of its clients without proper notice or authorization, and more specifically in breach of Facebook's privacy policy. While this information is far from new (it was actually brought to light in 2015 in the context of the U.K. Brexit campaign, and further investigated in 2016 prior to the US presidential election), this time it triggered indignation among the press and the public worldwide and, as a result and for the first time, some reactions from data protection authorities and Facebook and Cambridge Analytica themselves.

The UK investigation had the effect of a wake-up call for all stakeholders at the international level. Article 29 Working Party Chairwoman Andrea Jelinek announced on March 21, 2018 that it has decided to investigate the recent revelations involving Facebook and Cambridge Analytica, and more precisely "very serious allegation[s] with far-reaching consequences for data protection rights of individuals and the democratic process," with the ICO taking the lead role. Data protection authorities in Canada, Australia and India have all opened up investigations into the matter, as well. In the US, the Federal Trade Commission launched a new investigation on Facebook's practices, in light of its prior commitments under consent decrees issued by the FTC over the past. In response, Facebook has reportedly sent officials to Capitol Hill to help smooth over the fallout, and CEO Mark Zuckerberg apologized to all users on his profile's wall. As for Cambridge Analytica, the company's board announced on March 20th CEO Alexander Nix's suspension, and issued several official reports to respond to the ICO's allegations.

In this context, it is worth analysing the strategy deployed by Cambridge Analytica on behalf of its clients to gather and exploit a fine-grained database on citizens, and more precisely their reliance on Facebook to build this database (upstream) and use it (downstream) for voter micro-targeting purposes (notably on behalf the 2016 US Republican campaign, one of Cambridge Analytica's biggest clients).

How does Cambridge Analytica harvest personal data on Facebook using psychological profiling capabilities

According to its website, Cambridge Analytica holds a database of 220 million US adults with 4,000 to 5,000 data points on each (such as income, debt, hobbies, criminal history, purchase history, health concerns, gun ownership, and voting history, among others). Their website claims that “[their] core database contains information on demographics, consumer and lifestyle habits and political affiliation, as well as unique psychographic information on motivation and decision-making.” To better understand why this database is different from many databases held by other commercial brokers, and ultimately why it was so critical to so many digital campaigns' strategy (without considering whether it was effective or not), it is worth analyzing closely how this database was built.

Cambridge Analytica’s services rely essentially on psychometrics, a data-driven sub-branch of psychology. Psychometrics focuses on measuring psychological traits relating to one’s personality. At the heart of Cambridge Analytica’s approach to psychometrics is the so-called OCEAN model developed in the 1980s. OCEAN is an acronym which stands for openness (how open one is to new experiences), conscientiousness (how much of a perfectionist one is), extroversion (how social one is), agreeableness (how considerate and cooperative one is) and neuroticism (how easily someone gets upset). These five traits enable to assess each individual’s personality, after having gathered enough data so answer each of the five questions.

In 2008, a team of researchers at the Psychometrics Center at Cambridge University developed an app to assess one’s personality based on the OCEAN model, and they made it available on Facebook. This app, called “MyPersonality”, enabled users to get a personality profile after filling out several personality quizzes. Then, each individual profile was combined by the researchers with the same user’s online data (such as Facebook “likes” -- which used to be public by default --, gender, age, residence, etc.). Based on this technique, Cambridge Analytica seeded Facebook with similar personality quizzes from 2010. The quizzes were taken by hundreds of thousands of Facebook users, mostly female and young, but still including enough male and older users.

It is interesting to note -- especially in the turmoil of these days -- that the tests generally did not disclose the further use of the user’s score. Likewise, Cambridge Analytica’s Privacy Policy (as it was available on the company's website back in 2015 and 2016) does not elaborate on such surveys and on the further uses of the data collected and in particular does not provide clarity as to the potential secondary uses of such data. It only provides that the user’s information may be used to “gain insight to the behavior of the whole population”, without giving more details as to the information at stake, the range of potential recipients or their potential purposes in accessing this information.

As a result, Cambridge Analytica’s initial database soon grew substantial, enabling researchers to get detailed profiles on many Facebook users, but also to draw correlation between all publicly available information (including e.g., demographics, voter history, purchases history and online behavioral characteristics) and personality traits in order to further profile users who had not answered any personality quiz. For instance, a male user who liked the page of Madonna would be more likely to be gay than men who don’t; similarly, a woman with a preference for goods made in the US would be more likely to be a Trump supporter than women who don’t, etc.. Of course, these inferences are not accurate, as they rely on proxies which can be challenged and have a high rate of error. Nevertheless, from a statistical perspective, the accuracy rate of such inferences is often considered high enough. In turn, such a database made possible to use Facebook profiles data not only to assess the users’ personality, but further to search for specific profiles.

Then, the way Cambridge Analytica offers ad targeting services is not different from any other ad company. Cambridge Analytica first buys data from a broad range of sources, from commercial data brokers (such as Axciom or Experian) to public registries, including loyalty programs and club memberships. Based on the psychometrics correlations already identified by Cambridge Analytica (using the OCEAN model and large instances of personality quizzes), personality profiles are then generated for each individual in the database. Fears, needs and interests are associated to each profile to enable more specific and effective targeting.

Facebook's advertising features enable to process user data for voter micro-targeting based on their psychological profile

Many commentators have underlined that Cambridge Analytica's practices prejudice Facebook's user's privacy rights in a particularly damaging manner since the firm specializes in voter data analytics for political campaigns, and more precisely to support voter micro-targeting.

While the use of micro-targeting techniques for political campaigning is far from new, micro targeting, online and offline, has been increasingly used for political purposes over the last decade. For instance, the two Obama presidential campaigns relied heavily on micro-targeting. After integrating the major social media platforms’ data sets (among others) into its own comprehensive voter database (referred to as “Project Narwhal”, and mainly aimed at getting rid of the inherent constraints and limits of a silo-based approach to voter data), the Obama 2012 campaign relied on micro-targeting techniques to implement new methods of communications with potential voters, including by asking supporters to download a social networking app that sends targeted campaign messages to their Facebook friends (targeted sharing).

In addition to enabling the collection of psychometric data (and its further integration with other voter databases), Facebook offers many tools to help advertisers or campaign target their audience precisely (whether or not this targeting is based on psychometric profiling). In particular, Facebook’s “custom audiences” tool enables to each a specific group of users, such as a group of potential supporters. Facebook also allows advertisers to plug in data from their own database, but also from data brokers just like Cambridge Analytica. In addition, Facebook’s “Lookalike Audiences” feature enables to reach people who have similar profiles as those in a known group. As explained by Facebook, a Lookalike Audience is “a way to reach new people who are likely to be interested in your business because they're similar to people who already are”. When an advertiser creates a Lookalike Audience, “[it] choose[s] a source audience … and we identify the common qualities of the people in it (ex: demographic information or interests). Then we find people who are similar to (or "look like") them in the country/countries [it] choose[s].” The buyer of the service choses the source audience (which must include at least 100 users) and the size and country of the lookalike audience desired. Moreover, the Facebook’s “Brand Lift” survey capabilities to measure the success of the ads. Advertisers can then optimize their campaigns based on the results of these surveys to ensure campaigns are reaching maximum performance.

While these functionalities have been largely used in political campaigns over the last decade, Cambridge Analytica enabled recent political campaigns, such as the Trump campaign, to use them in two news ways. First, the data that was fed onto the Facebook ad platform included psychometric data (which had not been experimented during previous campaigns according to publicly available information) to target voters and serve them tailored ads. These ads are called “dark posts”, which stands for sponsored Facebook posts which can only be seen by users with very specific profiles. The possibility to micro-target voters based not only on demographic data but also on their psychological profiles was mainly used to direct specific messages and contents to specific sub-groups of the population. For instance, at some points during the Trump campaign the messages displayed on its behalf differed for the most part only in microscopic details, so as to target recipients in the most effective manner in light of their psychological profiles. As another example, on one day in August 2016, the Trump campaign posted 100,000 ad variations on Facebook.

Second, these ad functionalities have reportedly been used recently for voter suppression purposes. In other words, in addition to targeting actual and potential campaign supporters to ensure their effective vote, Cambridge Analytica's clients were able to use micro-targeting techniques based on demographics and psychological data to target potential supporters of their opponents and undecided voters on Facebook. As already underlined by some think thanks in 2010, the potential for voter suppression based on micro-targeting on social media is high: “[T]he Web Advertising and Behavioral Targeting techniques can be used to reveal different page views to different page viewers. For example, a viewer identified as a friendly voter could see correct information … while a voter identified as not being friendly could see a page with inaccurate or deceptive information.”

These practices have not been carried out secretly and some of Cambridge Analytica's clients actually came forward pretty clearly. The Trump campaign staff disclosed these voter suppression operations to a Bloomberg Businessweek reporter team who was given access to its digital campaign headquarter two weeks before election day. According to this investigation many targeted ads were served to wavering left-wingers, African Americans and young women in the last weeks of the campaign to try to dissuade them from voting. At that time -- almost one year and a half ago -- these revelations triggered some turmoil in the US among civil rights groups and other organizations. However, Cambridge Analytica reacted by denying any efforts to discourage any Americans from casting their vote in the presidential election, underlining that “its efforts were solely directed towards increasing the number of votes in the election.” As to Facebook, the ad platform refused at the time to confirm nor to publicly release any of the "suppression ads" or any of its audience targeting parameters.