Owing to digitalization of information in the era of Internet, reusability and reproducibility highlight the importance of copyright protection. According to the prediction in the white paper “Big Data for Development: Challenges and Opportunities” published by the United Nations: in 2017, the income of the global big data market will reach US$50 billion; and the statistics of the China Market Intelligence Center shows the commercial scale of China big data market to reach RMB46.34 billion by 2018. Along with the development of big data market, various relevant disputes emerged continuously. And these problems have become a center of attention. Hence, notably, in June, 2017, the National Copyright Administration of P.R.C. published Notice of Regulating the Electronic Works Registration Certificate as a measure to replace paper copies of work registration certificates with electronic copies.

Zhongguancun Tribunal of Beijing Haidian District People's Court ("BHDPC") and the Mediation Center of the China Internet Society jointly published Investigation and Research Report on Current Status and Prospect of Big Data and Judicial Protection for IP Rights, which shows that: among the judicial cases related to big data , 23% of them were those involving copyrights. In this era of Internet, each web site has accumulated massive amount big data, which is mainly statistical data accumulated by network platforms when providing services, including data of personal users, personal information that might be used in public analysis and so on. These pieces of information can also be easily intercepted at other web sites.

Cases of big data infringement have occurred constantly among various major network platforms. One of the most typical case involves reproduction of news on APP by other web sites without consent. During reproduction, relevant comments made by the clients were relocated altogether, and there comments would be expanded after each repetitive reproduction. Sohu is a network service provider mainly engages in business of sales information. In order to provide first-hand product information related to users, Sohu will take photos, inspect on the site, and observe the surrounding circumstance so as to create a virtualized product. Moreover, Sohu will collect users’ comments who are willing to buy a presented product. However, if another web site stole and shared the same information from sohu.com, other information and users’ comments will be shared altogether.

However some disputes arise. Some question who really owns the rights of these original data that include user's comments. How a suit can be fuled in case of infringement? How the subject matter in a litigation to be chosen in order to protect one's own rights and interests? Presently, there is no definite guidance for these problems in judicial practices because of the incomplete system of current laws.

Big data processing consists of four major stages:

  1. Collection and Preprocessing: Data can not be used directly after collection meaning that it has to be organized to a certain degree before further applications;
  2. Storage and management of data;
  3. Processing and analysis of data after storage;
  4. Data application after the data being processed and analyzed.

In the second stage above, after collection and organization, data will be constructed as a database or cloud. Such information can be protected by copyright. In the fourth stage, since the resulting data is reified mainly in the form of software, it can also be protected by software copyright .

According to the Chief Judge of Zhongguanchun Tribunal of BHDPC, there are principally two aspects of copyright protection that are applicable to big data: protection of database copyright and protection of software made in the applications of big data. The former is to protect the assembled works derived from big data, whereas the focus is the originality presented in big data, as well as the originality of selection, composition, and system and architecture of database, instead of to protect the contents of data themselves. On the other hand, the protected software of big data might easily be plagiarized. The source code of works of big data software might easily be directly plagiarized or extremely copied, or be tampered maliciously by a third party which then masks said big data software so as to commit infringement acts. During this process, the infringer might raise technical neutrality defense. However, definition of infringement is different between in technology neutrality and content neutrality. Currently, in a dispute relating big data software, it is not conclusive whether it is an infringement of contents or of technology. This poses as one of the toughest legal issue in the present stage, given that attempts and studies are undergoing in judicial practices. There are multiple means to protect big data. But because of the unique nature of big data, while proper rights and exclusivity shall be granted as a form of protection. Meanwhile the openness and mobility of the big data itself shall also be considered. Therefore, relevant problems must still rely on judicial practices to be further resolved in the future.