The Electronic Document Reference Model (EDRM) [1] is an open source framework for the E-Discovery process [see endnote 1 for a link to a list and description of the nine stages]. The EDRM is a collection of loosely coupled steps that affect data. The Processing stage generally happens between the Collection and Review stages. Processing reduces "raw" data into an organized format suitable for review databases and is the most technical of the nine stages. Given that processing activity is technical, how involved should legal teams be in this phase of discovery?

Legal teams do not need to know the Processing workflow as well as a data analyst. Even so, they can benefit by having a good understanding of the steps in the Processing phase, including how long it actually takes to complete a processing job (here’s a clue: processing is not simply “pushing a button”). Further, the oversight effort that a legal team contributes improves database quality. As a project manager, I shepherd data through the Processing phase and do not release data to data analysts for processing unless I have legal team approval.

Legal team members can make positive contributions through knowledge of four areas of the Processing effort:

  • General Processing vocabulary
  • Processing capabilities (internal or your vendors')
  • What to consider when scoping a Processing job request
  • How to spot Processing errors and issues

E-Discovery Lingua Franca: Learn the Language

One of the first actions a new practitioner in any industry undertakes is to tackle its jargon. A new lawyer needs to become familiar with E-Discovery technical terms. These terms comprise the language that your technical resources, opposing parties and industry use when managing data. A firm grasp of the terminology makes for more productive data discussions.

Development of your “data chops” happens organically with experience. If you want to be proactive about learning the lingo, visit the EDRM website. The EDRM project produces helpful glossaries, the largest of which is a 336-page compilation of smaller dictionaries.[2] Read this tome from front to back and you will be one of the few to have performed this feat of endurance. You can also download a PDF file for offline use.

Processing Capabilities: Know Available Resources

Your legal team may have an in-house E-Discovery technical group that handles processing. Likewise, you may use external service providers or a combination of both. Whatever the case, the capabilities of processing resources vary significantly. It helps to know what your processor can do and what tools they can bring to bear. Explore your service provider's website for information on their skillset. You may find capabilities listed that you would like to investigate further. For example, Kilpatrick Townsend & Stockton's LitSmart E-Discovery Team website contains a wealth of information about the services it offers, including this blog article!

Consult your processing contact to learn more about how their processing workflow operates. Ask questions to get detailed answers that marketing materials may not provide. Here is a sample of probing questions to get a conversation going:

  • Who is processing my data?
  • Will the same person always process my data?
  • How many data analysts do you use?
  • Are your analysts certified?
  • What is the average industry tenure of your analysts?
  • How do you handle after-hours requests?
  • How long does it take to process XX gigabytes of data?
  • What data formats can you not process?
  • How do you resolve processing errors?
  • What processing reports can you provide?

Best case, your processor has a quality processing workflow. They use well-trained people and procedures to give your data first class treatment. Worst case, you find your review team's efforts hamstrung with data problems. The more familiar you are with Processing, the less likely you are to experience the latter.

Processing Requests: Ask for the Proper Settings

Your technical vocabulary is up to snuff. You are comfortable with your processing partner's capabilities and procedures. Now learn more about the processing engine's parameters. The settings used in a processing data set can vary as required. Choose appropriate settings for whatever review goal you are trying to achieve. Here are a few areas that you can ask your processor to adjust on any given processing job.

  • Cull by file extension, file size, dates, or source folders
  • Remove or include files that contain specific keywords
  • Deduplicate by a custodian, globally or not at all
  • deNIST or not (deNIST = system file removal)
  • Replicate source folders or not
  • Expect foreign languages

While I will not go into the details in this article about why each of the above bullets is important, my point is that you can reduce processing errors by ensuring you are aware of the settings used in your jobs. If your case has an ESI Agreement, that document is a good reference from which to gather Processing settings. You can also check out my colleague Darcie Spruance’s blog on Processing Basics here.

Identifying Processing Issues is Art: You are an Artist

Intuition is one of the best tools a reviewer can use to spot processing-related data issues. A reviewer's job requires a great deal of focus. They should expend effort analyzing documents rather than dealing with data problems. If a database is experiencing excessive viewer issues, it likely contains processing errors. Contact your project manager to investigate.

Failure to remove system files is the most common processing error I see. Un-viewable system files make for an unpleasant review experience. One recent example I recall involved a database of 1.3 million documents. A reviewer complained about viewer rendering issues. No other reviewers complained about similar issues in their databases at the time. I noticed that the processing engine had tagged 330,000 of the 1,300,000 documents as un-processable. Long story short, the vast majority of these documents did not belong in the review data set. They should have been hidden from view during processing. The reviewer did a great job of identifying an isolated workflow issue.

Here are a few other common processing problems to guard against:

  • Missing document attachments
  • Missing extracted text
  • Mismatched native files, images, or extracted text/OCR

If your gut tells you that your database has a problem, chances are it does. When in doubt, ask your technical staff to investigate until satisfied.

Conclusion

Case teams do not need to be experts about document processing tool mechanics (that’s what WE are for!). However, to recap, we recommend that legal teams develop the following skills:

#1: A good understanding of the EDRM Processing framework.

#2: A sound high-level knowledge of processing tool capabilities.

#3: A facility for asking probing data-related questions.

#4: A knack for spotting problematic data.