This post is for in-house lawyers and information and risk managers who find themselves facing a relatively sudden plunge into mysterious lakes and oceans of big data.  For the last few years I’ve been developing legal frameworks for organizations diving into big data, e.g.:

Click here to view image.

These frameworks have not really made it into this blog before in the pragmatic way they live in my practice; the blog has tended rather to emphasize and predict large-scale transformations, for example those needed in privacy programs and their regulation, and those now appearing on the horizon for information governance programs, or to make other relatively high-level observations.   This post will get into the first stages of big data risk management in a much more concrete way, focusing on how you can use the big data initiative to define new trade secrets and new ways of protecting them, with implications for your contracts involving data and your eventual development of  what I call “data asset protection plans.”  It will also address what should and rarely does happen before bringing in external data sources and streams.

  1. Internal Databases

Let us say your organization is starting to figure out how to get more value from its own databases.  In most cases, it is important to recognize that patent and copyright law (in the US vs. Europe, where the Database Directive (96/9/EC) provides copyright-like protection to “authors” invested in the contents or presentation of their databases) are likely to offer only limited protections, so most of your efforts to protect the information must focus on careful definition and protection of trade secrets and contractual rights associated with the raw and inferred data and databases.

So consider this approach:  As your organization is identifying types of data and repositories that are of interest for the big data initiative, it may be viewed as essentially defining those types and repositories as trade secrets requiring special new protection.  For trade secret protection under the Uniform Trade Secrets Act, you need to show reasonable secrecy measures and economic value from those secrecy measures, and secrecy can be achieved through agreements, policy, training and infrastructure.  Therefore:

  1. Everybody handling those types of data you anticipate using now or in the future (whether on the big data project or not) could get confidentiality agreements beyond their general obligations to protect company assets;
  2. Careful protection of ownership and use rights and clean data destruction of the raw, usage and derivative data in any contracts with analytics vendors is critical to protecting both the data from security and privacy standpoints and its trade secret status;
  3. Policies could be modified to focus on data asset protection from a trade secrets perspective, requiring secrecy and protection;
  4. The information security levels assigned to those types of data could be the levels accorded sensitive information (and note that 2014 is said to be a year in which big data tools receive much-needed enterprise security “hardening”);
  5. Particularly if the data sources contain personal information, focusing on trustworthy, transparent and accountable controls, decisions and decisionmakers on use and/or the ever-changing standard of reasonableness in anonymization will only become more critical from a privacy standpoint and will bolster trade secrets arguments as well; and
  6. Training programs could stress the designation of data as trade secrets and the importance of continued efforts to protect the data as trade secrets.
  1. External Databases

Even if the initiative begins (as many do) with a focus on extracting value from data already owned and possessed by the organization, before you know it the organization will be incorporating new data types, such as machine-to-machine and  social data, and other data streams from the outside.   Legal needs to weigh in before these moments, for many reasons, including:

  1. Again, the ownership and use rights associated with the external data, and the ways in which they affect the ownership and use rights and trade secret status of derived data and inferences as well as internal data, are critical;
  2. If the external data is brought into the organization’s custody and control, as the big data storage/analytics tools encourage, and any of it might subject to existing preservation obligations resulting from reasonably likely or pending litigation or investigations, the organization may be forced to expand its legal holds and begin to grow a “digital landfill” much larger than any it has seen in the past;
  3. The organization may have regulatory or other duties, such as privacy or information security obligations, to understand, manage and/or protect the information once it possesses and controls it; and
  4. Antitrust concerns should be examined in some cases.


  1. Trade Secrets and Defensible Disposal

As you identify all the data types you may want and need to protect as trade secrets, you can also use that knowledge to improve or jump-start a defensible disposal program for the other data stores, and particularly the ones that come to appear worthless as you’re examining the new trade secrets.  In the longer term, these insights and the new trade secrets will help your records and document management programs and database governance programs to balance “data lakes” and defensible disposal, through making better-informed judgments about information and data that has ongoing value, also enabling more defensible and informed judgments about the useless data — or the data types the cost or risk of harm of which exceeds their worth — that can and should be destroyed.