Last week, two astronomers at CalTech predicted that a new planet is lurking at the edges of the solar system. They discovered the planet using computer simulations and mathematical calculations based on the planet’s gravitational effects on objects near it. Now, all we have to do is wait for them to actually observe the so-called Planet Nine.

Fortunately, for those of us contending with eDiscovery, we have to go through no such machinations to deduce the existence of personally identifiable information (PII) in data sets. PII is everywhere; in fact, unlike Planet Nine, it often hides in plain sight. PII can be as mundane as information that is often publicly available, such as your name, e-mail address, street address, and phone number. However, it also includes more sensitive information, such as your bank account information, Social Security number, biometric identifiers like fingerprints, and health history (called “protected health information,” or “PHI”). It can even include data trails created by exercise trackers or other Internet of Things devices. In short, PII is broadly defined as any data that can be used to identify you or any information that can be linked to you.

PII requires special protection under a universe of federal and state laws. Although the United States does not have a formal data protection scheme, federal laws shield specific types of data: for example, the Health Insurance Portability and Accountability Act, Gramm-Leach-Bliley Act, and Fair Credit Reporting Act govern healthcare, banking, and credit reporting, respectively. State laws are plentiful but inconsistent in their coverage; however, all but three states have laws governing data breaches.

These laws mandate that counsel and eDiscovery specialists protect PII from unnecessary disclosure during litigation or investigations, whether from production to opposing parties or the government, or from a security breach. One way is to work with opposing counsel to narrow the scope of discovery to avoid the need to produce PII. Another is to create a targeted collection strategy that avoids PII to the extent possible. Organizations should understand where the bulk of their PII resides: obvious sources are repositories of human resources, payroll, and benefits data.

However, a broader search for PII is required because people often use their e-mail to share this data, both for business and personal purposes. Parsing a voluminous data set for PII can be challenging, but today’s technology has simplified the process of identifying PII. Many eDiscovery platforms offer automatic data detection tools that can detect patterns, such as a series of numbers that form account or Social Security numbers. Other tools, such as automated redaction, pinpoint and redact user-provided terms, such as e-mail addresses. For documents that consist of mostly PII, inverse redaction tools hide all content except what reviewers have highlighted for retention.

Once data is collected for eDiscovery, it often escapes the orbit of even the most well-meaning organization’s efforts to preserve the sanctity of its data. To prevent disclosing this data and having their case fall into a black hole of sanctions or privacy penalties, eDiscovery counsel and their third-party discovery experts must use best practices and technology to ensure this data remains visible only to the naked eyes of those entitled to see it.