Computer forensic investigations are commonplace for matters dealing with allegations of trade secret theft. Forensic experts and IT security teams frequently use technological buzzwords and jargon to describe the steps and components of an investigation. Below are some commonly used terms and definitions that describe artifacts regularly examined during a forensic inspection, and also includes some conventional analyses that your forensic expert may provide as part of their report.  

  • Computer forensics: The preservation, examination, analysis, and reporting regarding digital media using methodologies and tools suitable for presentation in legal proceedings. Computer forensics is a specialized area of expertise, and has its own academic degree programs, professional certifications, and associated coursework. 
  • Custodian: A user of a computer system, and/or the owner of the data on a computer. The IT team may refer to the employees and staff as “users” – and the forensic examiners and legal team may use the term “custodians.
  • Forensic Image: A bit-for-bit complete copy of electronic data storage media, such as hard drives, USB drives, and optical disks. Also referred to as a “physical image,” these disk images contain all active (or non-deleted) data, as well as unallocated space (or “free space”) from a hard drive – which is where deleted files and their remnants can exist. Creating a forensic image for desktop or laptop systems used by key custodians is the standard in most forensic investigations.
  • Logical Image: Unlike the above, a “logical image” refers to an image file that contains only a select set of files/folders from storage media. In contrast to a full physical image, a logical image can be created where a physical image may not be feasible, such as larger server disks. 
  • Encryption: Hardware or software encoding employed to secure confidential information. Encryption is enabled in various ways – it can protect entire hard drives, or it can be set to protect specific files or folders. Further, users can setup encrypted or hidden partitions on their own systems. Working with your client to understand what encryption their systems use is key for any collection/preservation effort as it directly impacts how the collection is performed. Frequently, the IT/Security team must work closely with the data collection team to ensure the data collected can be decrypted and reviewed. 
  • Hash Value: Referred to as a “digital fingerprint,” a hash value is a string of alphanumeric characters that is the result of a mathematical algorithm. Hash values or “hashes” are unique to the original media or file from which they were derived. Hashes are used to authenticate evidence, and also used to identify known documents: suppose you have a key document you suspect of being copied to a hard drive. A search for known hashes can determine if that key document in fact resides on the hard drive, or on several hard drives.  
  • System Registry: On a Windows system, the system registry is a database that stores hardware and software configurations and options. From the standpoint of an investigation, the registry can help determine key information: a few examples include when the operating system was installed, which files were recently accessed, what USB devices may have been used recently, and what printers may have been connected to the computer. Corresponding artifacts on Mac systems may include “.plist” files, Preferences directories (folders containing configuration options), and log files. You use many terms in this paragraph that are not defined. 
  • File table: A database on a hard drive that tracks the files and folders stored on the hard drive. The file table is frequently analyzed to determine what files and folders exist, when they were created, and when they were modified. The specific details and operations of a file table vary from one system to another, or from a Mac system to a Windows system, to a Linux system – but the purpose is the same.  
  • Shortcut or Link file: On a Windows system, shortcuts are artifacts that are created when a user opens files, folders, and applications. Shortcut files contain information about the file or application that was opened, and can help pinpoint the source of a file – whether it exists on removable media, and when a file may have been moved or copied.  
  • Internet History: The history of user activity relating to web browsers used on particular devices. Often, analysis of the databases that are used with these applications can show what websites were accessed and when. If out-of-bound communications or file transfer / cloud services are among the results, it may provide new information to help direct an investigation.  
  • De-duplication: The removal of exact duplicate documents or emails from a data set.  
  • De-duplication reduces review time by ensuring that only one instance of any document is reviewed. Consider how many duplicates may exist in any data set as we all send and receive departmental or group emails. If only a fraction of the recipients are in your custodian set, there is a high probability you have duplicate documents across custodians.   
  • De-duplication is performed by hashing (see “Hash Value”) or mathematical algorithms that analyze the contents of data in a document; not its filename, nor the type of file – only the data content. This ensures that even if two or more identical documents have different names – they will still be reviewed only once. 
  • If you have multiple custodians, normally a global de-duplication is recommended: data processing technology exists to allow all custodians in possession of a document to be identified – even if duplicates are removed. A global de-duplication refers to removing duplicate docs from the entire data set, keeping just one “unique” document for review if duplicates are found. A custodian-level deduplication refers to removing only duplicate docs from a single custodian’s data. If a “hot” document is identified, a reviewer can simply look to see which custodians may have also been in possession of that same document. When using global de-duplication, consider using a custodian hierarchy to determine which custodians will have fewer duplicates removed, and which have more. For example: if no hierarchy is applied, a Tier 3 custodian may be in possession of a key “hot” document, while a Tier 1 custodian may not have the document in their data set. This is an unexpected result, and could result in issues or delays if more resources are devoted to Tier 1 document review. Assign a custodian hierarchy based on custodian importance/relevance to keep more documents in Tier 1 and avoid surprises. What is a global de-duplication? Elaborate slightly on what a custodian hierarchy is.  
  • Timeline: a process by which multiple artifacts are consolidated and analyzed in order to determine a chronological order of events occurring on a given system. This allows examiners to combine date and time information from disparate areas, system logs, temporary files, and combine them into a uniform body where discrete activities can be sequentially organized.

Jonathan Karchmer