Back in February, in a San Jose courtroom, a bombshell was dropped that could have been erased from the public record.
The revelation came in a now-settled legal dispute over Google’s Gmail service. At the heart of the case was a shocking accusation: Google had been scanning Gmail messages, in violation of federal wiretap laws.
It has been an unfolding drama ever since, affecting the nearly 170 million Gmail users worldwide, including the emails of Apps for Education users, yet the most important revelation was nearly kept secret from the public.
More than two dozen major media organizations fought to make sure that the case — and its wide-ranging implications for Internet users — received a full public airing. Last week, the judge in the case ruled that portions of the transcript of a February hearing that was held in open court could not be redacted retroactively, since that would be tantamount to closing the public courtroom.
Here’s what we found out.
It turns out that Google, which bases its business on collecting and analyzing huge reams of data for advertising purposes, has been scanning users’ emails in transit — that is, even before users have a chance to open or read them. This includes email messages that are deleted without being opened. That’s right, Google knows what’s in your email before you do.
Google tried, and failed, to redact information about this process from the court’s transcript of that critical February hearing. Its attempt – akin to putting toothpaste back into a tube – was a reversal of its previous position in the lawsuit, a pledge that there would be a “fully public airing of the issues raised by Plaintiffs’ motion for class certification.” Federal Judge Lucy H. Koh would have nothing of it: “Ex post facto sealing is even less appropriate here…”
Google’s move to censor portions of the transcript of this hearing that was open to the public mirrors the government’s recent attempt in Jewel v. NSA to scrub the transcript of a public hearing that was witnessed by a crowd of observers and the media. Fortunately, the First Amendment protects the public’s ability to learn what transpires in the courts.
The results are illuminating. The world now knows much more about how Google collects personal information from users of Gmail and Google Apps and, in this case, how it plugged a critical gap in its wide-ranging data mining operation to sweep up even more of users’ data.
Back in 2010, Google was facing a vexing problem. It was losing out on a treasure trove of personal information from millions of Gmail users who were slipping through its chief analytical tool, known as “Content OneBox.” Anytime they accessed their email through Outlook or on their iPhone, Google’s data machine wasn’t there to capture it all. So it needed a way to sidestep the problem, and fast.
Within a matter of months, the company shrewdly moved the Content OneBox from Gmail’s storage area to a position upstream in the “delivery pipeline” — meaning that it could now scan messages before they were sent. As the plaintiffs explained at the February hearing:
“Google made a choice. They said, you know what, when people are accessing emails by an IPhone, we are not able to get their information. When people aren’t opening their emails or they are deleting them, we are not able to get their information. When people are using Google Apps accounts where ads are disabled, we are not able to get that information. When people are accessing Gmail through some other email provider, we are not able to get that information. So what they did is they took a device that was in existence already and operating just fine back in the storage area, and they moved it to the delivery pipeline.”
This has sweeping consequences, as Chris Hoofnagle, director of privacy programs at Berkeley’s Center for Law & Technology, has described:
“Hiding ads while analyzing data takes advantage of a key deficit users have around internet services: users only perceive profiling if they receive ads. The content one box infrastructure would allow Google to understand the meaning of all of our communications: the identities of the people with whom we collaborate, the compounds of drugs we are testing, the next big thing we are inventing, etc. Imagine the creative product of all of Berkeley combined, scanned by a single company’s ‘free’ email system.”
Google’s stated goal is to “organize the world’s information,” but it fought to disclose how they’ve done it. Now we know.
This article was first published in USA Today. It is available at http://www.usatoday.com/story/opinion/2014/09/04/google-spying-advertising-privacy-column/15051241/.