When you last visited your favorite social media site or browsed the Internet using your search engine of choice, targeted advertisements and links probably appeared. You may have been shopping for a t-shirt on one site, only to find the same product advertised moments later on another site. This is a simple way of illustrating how algorithms work in Internet advertising or marketing. Algorithms are also increasingly aiding the way lawyers review information in an eDiscovery context.

An algorithm is a series of unambiguous steps required to solve a problem or perform a task. When you sort your e-mail, run spellcheck on a document, or sort a spreadsheet column by amount, you are applying a simple algorithm to data. In eDiscovery, more complex algorithms form the foundation of analytics used within many review platforms, such as predictive coding, concept clustering, or deduplication. For example, with predictive coding, an algorithm studies lawyers’ coding decisions on a sample set of documents, analyzes the documents’ content to determine what factors contributed to its responsiveness (or lack thereof), and then extrapolates the lawyers’ logic to the remaining documents in a collection. The more data the algorithm studies, and the more iterative the process is, the more accurate its results may become.

Algorithms have proven effective in assisting lawyers to identify key information in isolated litigation or regulatory matters. These algorithms, however–particularly those embedded within review platforms–only skim the surface of how analytics will aid lawyers in the future. There are some significant limitations of in-platform algorithms:

  • They are only used to analyze data related to a specific case or matter (rather than across an organization’s portfolio of legal or compliance matters);
  • They provide only reactive analysis; and,
  • They rely on “out of the box” algorithms and analyze only a number of possible document features
    (e.g., text only).

Organizations are searching for legal and compliance-focused algorithms that are highly customizable and not restricted by the limitations listed above. Much greater insight can be gained from analyzing larger datasets.

New analytics are being developed to allow organizations to aggregate document information and key data attributes (including attorney coding decisions) across all of an organization’s legal and compliance matters. This large data set, when combined with attorney legal expertise and a team of data scientists, yields much greater historical analysis and predictive ability. This vast repository of prior compliance and legal matters can be used to develop predictive algorithms that proactively search for potential violations or other forms of trouble hidden within an organization’s data.

For example, a pharmaceutical company seeking to avoid regulatory non-compliance (with its associated fines, recalls, audits and reputational harm) can develop a custom algorithm that searches for possible off-label marketing activities taking place within its field sales force or other parts of the company. Possible signs of illegal activity can then be remediated at an early stage. Additionally, specialized training and learning programs can be developed to target further issues. As the algorithm produces results for review, data scientists study the data and continuously refine the algorithm, leading to ever more accurate results.

Although algorithms serve countless functions in eDiscovery, when customized to meet an organization’s needs and connected to larger data sources, they also have endless possibilities for reducing risk by zeroing in on the most important documents for legal or compliance assessment. Without these algorithms, organizations would have to wade through virtual stacks of nonresponsive information and might risk overlooking disturbing patterns or trends.