In this ninth article in our series on "Big Data & Issues & Opportunities" (see our previous article here), we examine the aspects related to copyright, database rights and trade secrets. More particularly, we determine to what extent such protection mechanisms can apply to (big) data.
Intellectual property is defined by the Oxford English Dictionary as "intangible property that is the result of creativity". Intellectual property rights are the rights that adhere to such creations and that grant the holder(s) thereof a monopoly on the use of that creation for a specified period and subject to certain exceptions. The underlying aim of granting such (temporary) monopoly, which – admittedly – entails a certain social cost, is to incentivise creators to share their creation with the public, and to achieve the social benefits of increased creative activity.
In light of these elements, it cannot be excluded that certain elements of the big data lifecycle, such as individual pieces of data or entire datasets, fall within the scope of protection of certain intellectual property rights. This article examines those intellectual property rights that may be relevant in a big data context, and will look into the particular application in a big data environment of (i) copyright; (ii) database rights; and (iii) trade secrets.
Copyright ensures protection of various types of works, awarding protection to individual data as long as they are original and can be expressed in a material, concrete form. The broad understanding of these protection requirements facilitates extending, in principle, protection to different types of works, including to data.
It is however worth examining some of the most important characteristics of the EU copyright system in order to determine whether it may apply to (big) data.
Minimal EU harmonisation
Although the copyright rules applicable in the Member States are similar, the threshold of protection, the exceptions, the practical implementation, and the enforcement proceedings and remedies differ substantially. It is therefore of utmost importance to take into consideration the national legal traditions, examining both the applicable national legislation and its interpretation by national courts
The lack of full harmonisation of copyright protection at EU level is likely to a have chilling effect on EU-wide big data projects, since it requires a separate protection assessment for data originating from different Member States.
For a work to be protected by copyright, it must be original, meaning it is the author's own original creation and reflects his/her personality, where he/she has been able to express his/her creative freedom by making free and creative choices and thus stamping his/her personal touch onto the work. Generally speaking, the threshold for a work to be original is relatively low, especially in certain Member States.
This being said, although copyright protection has a broad scope, it nonetheless requires an intellectual human intervention and the consciousness of achieving a result. Therefore, raw data such as weather forecasts, stock quotations or sports scores would in principle be excluded from copyright protection.
Unfortunately, there is no unequivocal answer as to what types of data fall under such protection, and thus, the eligibility for protection needs to be examined on a case-by-case basis and in light of the particular rules and case-law in each country.
In the context of big data projects, it is crucial to understand to what extent the data used can be copyright protected. In all likelihood, most of the data collected and processed in a big data analytics context will not be considered original and will therefore not benefit from copyright protection. Having said that, it cannot be excluded that the individual data can gain originality once they are connected with other information or presented in an original way (by means of different possible forms of expression).
For a work to be protected, it must be fixed in some material (concrete) form. In this context, 'fixation', in a data context, would mean that the specific information needs to be saved in a tangible form. The form of saving the data can differ from handwritten notes (files), through photographic documentation (image) or recorded testimonies (sound) to digitised archives (digital files), as long as it remains concrete, can be easily identified and described. Results that have not yet been produced (future data), or results that cannot yet be described (e.g. because there are no means yet to express them) cannot benefit from copyright protection for as long as they have not materialised.
This can present some difficulties in a big data context, given that big data tends to involve dynamic datasets and notably relies on cloud computing services.
Absence of registration
The legal framework for copyright does not provide for a registration system. Accordingly, the eligibility for protection (and its scope) can only be confirmed a posteriori by a court, leading to a lack of legal certainty in the meantime.
The copyright holder is granted several exclusive economic rights that allow controlling the protected work's use and facilitate enforcement in case a third party uses the work without authorisation. The rights of reproduction, communication to the public and distribution are indeed a useful toolkit which, balanced by the copyright exceptions, allows for an optimal protection of right holder's interests. Copyright law therefore provides for a wide scope of measures securing the rights of the author in case of dissemination of his work and the use of these works by third parties. The rules governing copyright protection aim at enabling further use of the works, securing at the same time the legitimate interests of the author.
In a data environment, the most important hindrance resulting from copyright protection is the necessity to obtain authorisation from the copyright holder of each individual data. In the context of big data projects, to the extent copyright applies, it would require identifying authors of hundreds (if not hundreds of thousands) of works. In many cases, it might be difficult to identify or find the right holder and/or understand whether he has given his authorisation for use of the work. In practice, this means that time-consuming analyses need to be performed before the data gathered can be used.
Furthermore, as regards the possibility to acquire copyright in data, the exclusivity of this type of right constitutes a hindrance, since it does not allow acquiring copyright in the same data "in parallel". The copyright protection foresees for the work to have one author or several co-authors (meaning respectively sole or joint ownership of rights), but excludes the possibility that different entities acquire the same right independently under a different title (e.g. if the data were collected independently or on the basis of different sources). The latter may however often be the case in a big data context, in particular where parties will be independently collecting the same or similar data, leading to the creation of convergent datasets.
In addition to the exclusive economic rights, authors are also granted so-called "moral rights", which are related to the idea that a work is not a mere staple commercial object, but also the expression of the personality of the author.
Moral rights are not harmonised across the EU but a common concept is included in the Berne Convention, which provides for minimum standards in this respect: the author has the right, even after the transfer of the economic rights, to claim authorship of the work and to object to derogatory actions (distortion, mutilation, or other modification) to the works which would be harmful to the author's honour or reputation. In some Member States, there is no possibility to validly assign moral rights, whereby additional measures need to be taken to guarantee that the acquirer of the economic rights is free to use and modify works protected by copyright.
Looking from a transactional angle, moral rights of authors can also be seen as a hindrance. Since at least in some Member States there is no possibility to validly assign moral rights, additional measures need to be taken to guarantee that the acquirer of the economic rights is free to use and modify data protected by copyright, to the extent necessary for big data projects.
Finally, it is worth noting that on 14 September 2016, the Commission published several legislative proposals aiming to modernise the existing EU copyright rules. One of the core pillars of the reform is the Directive on copyright in the Digital Single Market (the “DSM Directive”). Political agreement was reached on 13 February 2019 by the European Parliament, the Council of the EU and the European Commission on the proposal for the DSM Directive. The DSM Directive does not aim to clarify the protection of data under copyright law nor provide for new rules relating to the development and increased use of digital tools such as big data and the Internet of Things. It however includes a new – yet limited – exception for text and data mining aimed at enabling universities and research organisations to use automated techniques to analyse large sets of data for scientific purposes, including in the context of public-private partnerships. The DSM Directive also introduces an additional exception into their national legislation for text and data mining for other users beyond the area of academic research. However, rightholders may expressly make reservations "in an appropriate manner, such as machine readable means for the content made publicly available online".
Apart from individual data, collections of data (databases) are another element important to consider when examining the protection of data, including in a big data context. When considering such protection, a distinction needs to be made between, on the one hand, the database’s contents (individual data), and, on the other hand, its structure and the investment made in its creation. We examine the latter elements below.
While the general rules governing the protection of database are established at international level, EU law provides for a specific protection of databases which goes beyond other international legal instruments. In such respect, the EU institutions adopted the Database Directive with the objective of harmonising the protection of databases in all Member States.
Similarly to copyright, the level of protection ensured across the Member States, especially concerning the copyright on databases, is significantly different. This particularly hinders the possibility to manage pan-European projects, since it implies the necessity to examine multiple national legislations in order to have clearance on the possibility to use data, or secure the investment made in a database containing data originating from different territories.
The protection established by the Database Directive is dual, and supplements the possible protection granted to the data as such.
More specifically, databases, within the broad meaning of the Database Directive, are protected in the EU by (i) copyright, where such copyright protection echoes the one recognised in the international treaties; and (ii) a sui generis right. While copyright protects the (original) structure of the database, the sui generis right aims to cover the investment made in its creation. These two rights are independent, and can be applied separately. They will however apply cumulatively if the conditions for both regimes are simultaneously met.
The term of the sui generis protection is much shorter than that of the copyright protection. It is limited to 15 years as from the first of January of the year following the date of completion of the database. However, such protection may in practice be much longer. According to the Database Directive, any substantial change to the contents of the database, that could be considered to be a new investment, will cause the term of protection to run anew. In practice, should such protection be applied in a big data context, this could result in providing an indefinite protection, given that databases are usually dynamic, hence, leading in all likelihood to "substantial changes to the contents of the database".
Copyright protection of databases
Copyright protection is granted to databases which, as such, by reason of the selection or arrangement of their contents, constitute the "author's own intellectual creation". A database structure may be protected under copyright even if the elements contained therein are in the public domain or are otherwise not protected by copyright.
It also follows from the previous considerations that the originality criterion might be more difficult to fulfil in case of automatically created electronic databases that contain data selected by software, without the actual involvement of an author. In such situations it seems more likely to award copyright protection to the underlying software (algorithm written in a way allowing for selection of specific data/types of data), than to the database itself.
This is particularly relevant in a big data context. Indeed, the development of technology has enabled data analytics of unstructured data. Accordingly, while protection of datasets is particularly relevant, the protection of the database structure has become less relevant and more difficult when confronted to new types of databases, unforeseen by the (over twenty-year-old) Database Directive.
Sui generis protection of databases
The second type of protection introduced by the Database Directive is the protection awarded on the basis of a sui generis right, rewarding the substantial investment of the database maker in creating the database. It was developed in order to prevent free-riding on somebody else's investment in creating the database and exists in parallel to the copyright protection on the structure of the database.
In order for a database to be protected by the sui generis right, an investment must be made in the creation of the database. The jurisprudence of the Court of Justice of European Union ("CJEU") has clarified that an investment in the creation of the data as such does not suffice to merit protection under the sui generis right. Such reasoning would entail that the sui generis right does not apply to machine-generated databases, as it could be argued that the data included in such databases are 'created' instead of 'obtained'. This could have a broader effect on the data economy, which relies on digitisation processes such as Internet of Things devices, big data, and artificial intelligence; as it becomes increasingly difficult to distinguish between the generation and the obtainment of data in the context of such processes.
That being said, there is no automatic exclusion from sui generisprotection when the database's creation is linked to the exercise of a principal activity in which the person creating the database is also the one creating the materials that are processed in the database. It is however always the responsibility of that person to demonstrate a substantial investment (qualitative and/or quantitative) in the obtaining, verification or presentation of the content, independent from the resources used to create the content.
In any event, we foresee that it will become increasingly difficult to satisfy the sui generis right protection requirements in a data economy context, given that the processes of obtaining, verifying and/or presenting the data will happen more and more automatically, as they will be normally conducted using an algorithm. In many cases, it might be true that the investment in creating the raw material exceeds the investment made in segmenting and aligning that pre-existing raw material. In those cases, it might be more difficult to rely on the sui generis protection.
It is in our view regrettable that the Database Directive, which was drafted in the 90s, does not accommodate for the technical evolution and thus everything that is possible with data and databases today. For instance, it is unclear how techniques of enrichment, partitioning, harmonisation, homogenisation, etc. of data would fit within the criteria of obtaining, verification or presentation of the database contents. Moreover, the criterion of 'verification' may become less and less pertinent, especially in a big data context which allows analytics of unstructured data.
Illustration in the transport sector: In 2010, the German Federal Court of Justice held in its Autobahnmaut decision that a highway company could claim a sui generis right in a database of machine-generated data about motorway use, i.e. toll data. The Court found that the company had made a substantial investment in the 'obtaining' of pre-existing data on cars using the motorway and in the processing of such data through software ('verifying' and 'presenting').
If the same reasoning is transposed to other databases in the transport sector, e.g. of data generated by sensors in cars, this could become problematic as certain companies (such as car maintenance services or secondary vehicle accessory providers) could be denied access to data vital to their services on the basis of a sui generis right.
Possibility to protect data under database rights
In view of the rules described above it seems that there is very limited to no possibility to secure individual data by means of database protection.
It is true that the sui generis protection forbids extraction of all or a substantial part of the database contents to another medium, preventing thus also the copying of the individual data collected in a database. However, once the database maker renders the contents of its database accessible to the public, it cannot prevent third parties from consulting that database. The public is therefore aware of these data (information), and may use them without necessarily having to copy the database contents. Also, the current legal regime seems difficult to reconcile with developments in technologies such as big data or data mining that do not necessarily require data to be reproduced in order to perform analytics or mining processes.
In consequence, the ownership of rights to a database does not confer the rights to the individual data as such. In this context, database protection (both by copyright and the sui generis protection) should rather be seen as a complementary measure to protection granted to individual data under other titles such as traditional copyright or trade secret protection.
Having said that, it is important to observe that employing specific technical measures to block access to the database’s content may ensure a de facto protection of individual data, preventing the possibility to subject them to data mining or other types of automatic filtering initiated by third parties.
While copyright and database rights provide measures enabling control over the diffusion and use of works (including data that fulfil the originality criterion) and databases, the objective of trade secret protection is to keep commercially valuable information confidential or secret. Protecting undisclosed know-how and business information enables its creator to transform the effort invested in generating this know-how and information into a competitive advantage.
In view of big data projects, trade secret protection may provide a safeguard as it allows for protection of individual pieces of information regardless of their originality. It also does not differentiate between the types of data that might be protected. Moreover, the protection is unlimited in time, as long as the information has not been disclosed.
EU legal framework
Similarly to databases, only general rules requiring protection of trade secrets have been embedded in international law. At EU level however, trade secret protection has been established by the adoption by the European Parliament and the Council of Directive 2016/943 on the protection of undisclosed know-how and business information (trade secrets) against their unlawful acquisition, use and disclosure ("Trade Secrets Directive"). The Directive aims to standardise the national laws of the Member States as regards the unlawful acquisition, disclosure and use of trade secrets.
The Directive harmonises the definition of trade secrets in accordance with existing internationally binding standards. It also defines the relevant forms of misappropriation and clarifies that reverse engineering and parallel innovation must be guaranteed (since trade secrets are not, strictly speaking, a form of exclusive intellectual property right).
Data protected as trade secrets
According to the definition provided in the Trade Secrets Directive, a ‘trade secret’ is a piece of information which meets all of the following requirements: (i) it is secret in the sense that it is not, as a body or in the precise configuration and assembly of its components, generally known among or readily accessible to persons within the circles that normally deal with the kind of information in question; (ii) it has commercial value because it is secret; and (iii) it has been subject to reasonable steps under the circumstances, by the person lawfully in control of the information, to keep it secret.
Trade secrets should be seen as complementary to intellectual property rights. They are heavily used in the creative process leading to innovation and the creation of intellectual property rights. Trade secrets are also used in relation to commercially valuable information for which there is no intellectual property rights protection, but for which investment and/or research are nevertheless required and which are important for innovation. Moreover, some may prefer to opt for a trade secret protection rather than an intellectual property right, as this may allow them to have an everlasting protection (as long as the conditions for trade secret protection remain fulfilled).
In a big data context, the protection established for trade secrets will expand to every piece of information (data), as long as it fulfils the protection requirements mentioned above. Some requirements are however difficult to fulfil, such as the need for the data to remain secret. It seems that at least in some jurisdictions it is possible to rely on confidentiality agreements to ensure that the requirement of secrecy of the data under the Trade Secrets Directive is maintained even after the transfer of data has been exercised. This is however yet to be confirmed by the courts. Also, it may be difficult to demonstrate that an individual data has commercial value because it is secret. Many data will be considered valuable only if they are part of a bigger dataset.
Trade secrets rights
As such, a trade secret holder has no private or exclusive rights to its use. Trade secrets are thus different from intellectual property rights, which are safeguarded through an exclusive right that is legally enforceable. This is notably confirmed in Recital 16 of the Trade Secrets Directive which states that "in the interest of innovation and to foster competition, the provisions of this Directive should not create any exclusive right to know-how or information protected as trade secrets". This entails that the independent discovery of the same know-how or information remains possible.
In the event that one may rely on trade secret protection, the holder of a trade secret cannot prevent competitors from copying and using the same solutions – reverse engineering (i.e. the process of discovering the technological principles of a device, object or system through analysis of its structure, function and operation) is entirely lawful. Trade secrets are only legally protected in instances where someone has obtained the confidential information by illegitimate means (e.g. through spying, theft or bribery).
It follows that once the dataset is published, or disclosed in any other way, the protection can no longer be claimed. This is particularly relevant in a big data context, as data used for big data analytics, and made publicly available, will not qualify as trade secrets. Therefore, when considering to outsource big data analytics, any company should carefully assess whether its datasets comprise trade secrets that are valuable to the company and which cannot be disclosed for that reason.
It follows that it cannot be excluded that different actors in the big data analytics lifecycle will try to claim intellectual property rights or protection under trade secrets in (parts) of the datasets intended to be used. They may therefore try to exercise the exclusive rights linked to the intellectual property right concerned or keep the information secret. Any unreasonable exercise of rights may stifle data sharing and thus innovation through big data, including in the transport sector. This is however mainly due to the inherent nature and purpose of intellectual property rights and trade secrets protection, which may at the same time provide an incentive for stakeholders to engage in data sharing for big data purposes.