In respect of technological advancement, the law often comes into play merely as an external restriction. That is, lawyers are asked whether a given technology is consistent with existing legal regulations or to evaluate its foreseeable liability risks. As a legal researcher, my interest is the exact opposite: how do new technologies influence our legal framework, concepts, and doctrinal constructions? This contribution shows how Artificial Intelligence (AI) challenges the traditional understanding of the right to data protection and presents an outline of an alternative conception, one that better deals with emerging AI technologies.
I. Traditional Concept of the Right to Data Protection
In the early stages of its data protection jurisprudence, the German Federal Constitutional Court took a leading role in establishing the right to data protection, not only in Germany, but also in the European context.Footnote 1 In the beginning, it linked the ‘right to informational self-determination’ to a kind of property rights conception of personal data.Footnote 2 The Court explained that every individual has a ‘right to determine himself, when and in which boundaries personal data is disseminated’Footnote 3 – just as an owner has the right to determine herself when she allows someone to use her property.Footnote 4 This idea, which is already illusory in the analog world, has often been ridiculed as naive in our contemporary, technologically interconnected and socially networked reality, in which a vast spectrum of personal data is disseminated and exchanged at all levels almost all of the time.Footnote 5 Data simply does not possess the kind of exclusivity to justify parallels with property ownership.Footnote 6 The German Constitutional Court seems to have recognized this. And while the Court has not explicitly revoked the property-like formula, it has made decreasing use of it, and in more recent decisions, has not referred to it at all.Footnote 7
Even if everyone can agree that the right to data protection is, in substance, not akin to a property interest in one’s personal data, the right to data protection is formally handled as if it were a property right. In the same way that any non-consensual use of one’s property by someone else is regarded a property rights infringement, any non-consensual use – gathering, storage, processing, and transmission – of personal data is viewed as an infringement of the right to data protection. This formal conception of data protection is not only still prevalent in the German context, but the European Court of Justice (ECJ) perceives the right to data protection under Article 8 of the Charter of Fundamental Rights of the European Union (CFR) in much the same way. In one of its latest decisions, the ECJ confirmed that data retention as such constitutes an infringement irrespective of substantive inconveniences for the persons concerned:
It should be made clear, in that regard, that the retention of traffic and location data constitutes, in itself, … an interference with the fundamental rights to respect for private life and the protection of personal data, enshrined in Articles 7 and 8 of the Charter, irrespective of whether the information in question relating to private life is sensitive or whether the persons concerned have been inconvenienced in any way on account of that interference.Footnote 8
According to the traditional perspective, each and every processing of personal data infringes the respective right – just as the use of physical property would be an infringement of the property right.Footnote 9 For instance, if my name, license plate, or phone number is registered, this counts as an infringement; if they are stored in a database, this counts as another infringement; and if they are combined with other personal data, such as location data, this counts as yet another infringement.Footnote 10 Even though the right to data protection is not regarded as a property right, its formal structure still corresponds with that of a property right.
This conceptual approach is a mixed blessing. On the one hand, it provides a very analytic approach to the data processing in question. On the other hand, the idea of millions of fundamental rights infringements occurring in split seconds by CPUs processing personal data seems a rather exaggerated way of conceptualizing the actual problems at hand. Nevertheless, modern forms of data collection are still conceptualized in this way, including automated license plate recognition, whereby an initial infringement occurs by using scanners to collect license plate information and another infringement by checking this information against stolen car databases,Footnote 11 etc.
II. The Intransparency Challenge of AI
AI technology is driven by self-learning mechanisms.Footnote 12 These self-learning mechanisms can adapt their programmed algorithms reacting to the data input.Footnote 13 Importantly, while the algorithms may be transparent to their designers,Footnote 14 after the system has cycled through hundreds, thousands, or even millions of recursive, self-programming patterns, even the system programmers will no longer know which type of data was processed in which way, which inferences were drawn from which data correlations, and how certain data have been weighted.Footnote 15
The self-adaptive ‘behavior’ of at least certain types of AI technologies leads to a lack of transparency. This phenomenon is often referred to as the black box issue of AI technologies.Footnote 16 Why is this a problem for the traditional approach to evaluating data protection?
The analytical approach is based on the justification of each and every processing of personal data. In AI systems, however, we do not know which individual personal data have been used and how many times they have been processed and cross-analyzed with what types of other data.Footnote 17 It is thus impossible to apply the analytical approach to determine whether, how many, and what kind of infringements on a thus conceived right to data protection occurred. AI’s lack of transparency seems to rule this out. Thus, AI creates problems for the traditional understanding and treatment of the right to data protection due to its lack of transparency.Footnote 18 These issues are mirrored in the transparency requirements of the General Data Protection Regulation, which rests very much on the traditional conception of the fundamental right to data protection.Footnote 19
III. The Alternative Model: A No-Right Thesis
The alternative conceptualization of the right to data protection that I would like to suggest consists of two parts.Footnote 20 The first part sounds radical, revisionary, and destructive; the second part resolves the tension created by a proposal that is doctrinally mundane but shifts the perspective on data protection rights substantially. Among other advantages, the proposed shift in perspective could render the right to data protection more suitable for handling issues arising from AI.
The first part is a no-right-thesis. It contends that there is no fundamental right to data protection. That is, the right to data protection is not a right of its own standing. This explains why the ongoing quest for a viable candidate as the proper object of the right to data protection has been futile.Footnote 21 Article 8 CFR, which seems to guarantee the right to data protection as an independent fundamental right, rests on the misunderstanding that the fundamental rights developments in various jurisdictions, namely also in the jurisdiction of the German Federal Constitutional Court, have created a new, substantive fundamental right with personal data as its object. There is no such new, substantive fundamental right. This, however, does not mean that there is no fundamental rights protection against the collection, storage, processing, and dissemination of personal data. Yet data protection does not take the form of a new fundamental right – property-like or otherwise.
The second part of the thesis reconstructs the ‘right’ by shifting the focus to already existing fundamental rights. Data protection is provided by all of the existing fundamental rights, which can all be affected by the collection, storage, processing, and dissemination of personal data.Footnote 22 In his instructive article ‘A Taxonomy of Privacy’, Daniel Solove developed a whole taxonomy of possible harms that can be caused by data collection.Footnote 23 They include the loss of life and liberty, infringements on property interests and the freedom of expression, violations of privacy, and denials of due process guarantees. It is easy to see how the dissemination of personal finance information can lead to the loss of property. He cites tragic cases, which have even led to a loss of life, such as when a stalker was handed the address of his victim by public authorities ‒ data he used to locate and kill her.Footnote 24 Solove’s list suggests that the essence of data protection cannot be pinned down to merely a single liberty or equality interest but instead potentially involves every fundamental right. Understood correctly, the right to data protection consists in the protection that all fundamental rights afford to all the liberty and equality interests that might be affected by the collection, storage, processing, and dissemination of personal data.
The way in which fundamental rights protect against the misuse of personal data relies on doctrinally expanding the concept of rights infringement. Fundamental rights usually protect against actual infringements. For example, the state encroaches upon your right of personal freedom if you are incarcerated, your right to freedom of assembly is infringed when your meeting is prohibited or dispersed by the police, and your freedom of expression is violated when you are prevented from expressing your political views. Usually, however, fundamental rights do not protect against the purely abstract danger that the police might incarcerate you, might disperse your assembly, or might censor your views. You cannot go to the courts claiming that certain police behavioral patterns increase the danger that they might violate your right to assembly. The courts would generally say that you have to wait until they either already do so or are in the concrete process of doing so. In some cases, your fundamental rights might already protect you if there is a concrete danger that such infringements are about to take place, so that you do not have to suffer the infringement in the first place if it were to violate your rights.Footnote 25 These cases, however, are exceptions.
The right to data protection works differently. What is unique about data protection is its generally preemptive character. It already protects against the abstract dangers involved in the collection, storage, and processing of personal data.Footnote 26 Data protection preemptively protects against violations of liberty or equality interests that are potentially connected to using personal data.Footnote 27 The collection, aggregation, and processing of data as such does no harm.Footnote 28 This has often been expressed in conjunction with the idea that data needs to become information in certain contexts before it gains relevance.Footnote 29 It is only the use of data in certain contexts that might involve a violation of liberty or equality interests. The collection of personal data on political or religious convictions of citizens by the state is generally prohibited, for example, because of the potential that it could be misused to discriminate against political or religious groups. Data protection demands a justification for the collection of personal data, even if such misuse is only an abstract danger.Footnote 30 It does not require concrete evidence that such misuse took place, or even that such misuse is about to take place. The right to data protection systematically enhances every other fundamental right already in place to protect against the abstract dangers that accompany collecting and processing personal data.Footnote 31
A closer look at the court practice regarding the right to data protection reveals that, despite appearances, courts neither treat the right to data protection as a right on its own but instead associate it with different fundamental rights, depending on the context and the interest affected.Footnote 32 Even at the birth of the right to data protection in Germany, in the famous “Volkszählungs-Urteil” (census decision), the examples the court gave to underline the necessity for a new fundamental right to ‘informational self-determination’ included a panoply of fundamental rights, such as the right to assembly.Footnote 33 In an unusual process of constitutional migration, the court pointed to the ‘chilling effects’ the collection of data on assembly participation could have for bearers of that right,Footnote 34 as they were first discussed by the US Supreme Court.Footnote 35 The German Federal Court drew on an idea developed by the US Supreme Court to create a data protection right that was never accepted by the latter. Be that as it may, even in its constitutional birth certificate, data protection is not put forth as a right on its own but associated with various substantive fundamental rights, such as the right to assembly.
Further evidence of the idea that personal data is not the object of a substantive stand-alone right is provided by the fact that data protection does not seem to stand by itself, even in a jurisdiction in which it is explicitly guaranteed. Article 8 CFR explicitly guarantees a right to data protection. In the jurisprudence of the Court of Justice of the European Union, however, it is always cited in conjunction with another right.Footnote 36 The right to data protection needs another right in order to provide for a substantive interest – usually the right to privacy,Footnote 37 but sometimes also other rights, such as free speech.Footnote 38 Thus, even when data protection is codified as an explicit, independent fundamental right, as it is in the Charter, it is nevertheless regarded as an accessory to other more substantive fundamental rights.Footnote 39 This is odd if the right to data protection is taken at face value as a substantive right on its own but only natural if taken as a general enhancement of other fundamental rights.
IV. The Implication for the Legal Perspective on AI
If the right to data protection consists in a general enhancement of, potentially, every fundamental right in order to already confront the abstract dangers to the liberty and equality interests they protect, it becomes clear how personal data processing systems must be evaluated. They have to be evaluated against the background of the question: to what extent does a certain form of data collection and processing system pose an abstract danger for the exercise of what type of fundamental right? Looking at data collection issues in this way has important implications – including for the legal evaluation of AI technologies.
1. Refocusing on Substantive Liberty and Equality Interests
First, the alternative conception allows us to rid ourselves of a formalistic and hollow understanding of data protection. It helps us to refocus on the substantive issues at stake. For many people, the purely formal idea that some type of right is always infringed when a piece of personal information has been processed, meaning that they have to sign a consent agreement or click a button, has become formalistic and stale in the context of data protection regulation. The connection to the actual issues that are connected with data processing has been lost. For example, during my time as vice dean of our law faculty, I attempted to obtain the addresses of our faculty alumni from the university’s alumni network. The request was denied because it would constitute an infringement of the data protection right of the alumni. The alumni network did not have the written consent of its members to justify this infringement. As absurd as this might seem, this line of argument is the only correct one for the traditional, formal approach to data protection. Addresses are personal data and any transfer of this personal data is an infringement of the formal right to data protection, which has to be justified either by consent or by a specific statute – both of which were lacking. This is, however, a purely formal perspective. Our alumni would probably be surprised to know that the faculty at which they studied for years, which handed them their law degrees, and which paved the road to their legal career does not know that it is their alma mater. There is no risk involved for any of their fundamental rights when the faculty receives their address information from the alumni network of the very same university. An approach that discards the idea that there is a formal right to data protection, but asks which substantive fundamental rights positions are at stake, can resubstantialize the right to data protection. This also holds for AI systems: the question would not be what type of data is processed when and how but instead what kind of substantive, fundamental right position is endangered by the AI system.
2. The Threshold of Everyday Digital Life Risks
Second, refocusing on the abstract danger for concrete, substantive interests protected by fundamental rights allows for a discussion on thresholds. Also, in the analog world, the law does not react to each and every risk that is associated with modern society. Not every abstract risk exceeds the threshold of a fundamental rights infringement. There are general life risks that are legally moot. In extreme weather, even healthy trees in the city park carry the abstract risk that they might topple, fall, and cause considerable damage to property or even to life and limb. Courts, however, have consistently held that this abstract danger does not allow for public security measures or civil claims to chop down healthy trees.Footnote 40 They consider it part of everyday life risks that we all have to live with if we stroll in public parks or use public paths.
The threshold for everyday life risks holds in the analog world and should hold in the digital world, too. In our digital society, we have to come to grips with a – probably dynamic – threshold of everyday digital life risks that do not constitute a fundamental rights infringement, even though personal data have been stored or processed. On one of my last visits to my physician, I was asked to sign a form that would allow his assistants to use my name, which is stored in their digital patient records, in order to call me from the waiting room when the doctor is ready to see me. The form cited the proper articles of the, at the time, newly released General Data Protection Regulation of the European Union (Articles 6(1)(a) and 9(2)(a)). There might be occasions where there is some risk involved in letting other patients know my name. If the physician in question were an oncologist, it might lead to people spreading the rumor that I have a terminal illness. This might find its way to my employer at a time when my contract is up for an extension. So, there can indeed be some risk involved. We have, however, always accepted this risk – also in a purely analog world – as one that comes with the visit of physicians, just as we have accepted the risk of healthy trees being uprooted by a storm and damaging our houses, cars, or even hurting ourselves. As we have accepted everyday life risks in the analog world, we have to accept everyday digital life risks in the digital world.
For AI technologies, this could mean that they can be designed and implemented in a way that they remain below the everyday digital life risk threshold. When an AI system uses anonymized personal data, there is always a risk that the data will be deanonymized. If sufficient safeguards against deanonymization are installed in the system, however, they may lower the risk to such a degree that it does not surpass the level of our everyday digital life risk. This may be the case if the AI system uses data aggregation for planning purposes or resource management, which do not threaten substantive individual rights positions. An example of a non-AI application is the German Corona-Warn-App, which is designed in such a way as to avoid centralized storage of personal data and thus poses almost no risk of abuse.
3. A Systemic Perspective
Third, the alternative approach implies a more systemic perspective on data collection and data processing measures. It allows us to step back from the idea that each and every instance of personal data processing constitutes an infringement of a fundamental right. If data protection is understood as protection against abstract dangers, then we do not have to look at the individual instances of data processing. Instead, we can concentrate on the data processing system and its context in order to evaluate the abstract danger it poses.
Unlike the traditional approach, focusing on abstract dangers for substantive fundamental rights that are connected with AI technologies does not require the full transparency of the AI system. The alternative approach does not require exact knowledge of when and how what kind of data is processed. What it needs, however, is a risk analysis and an evaluation of the risk reduction, management, correction, and compensation measures attuned to the specific context of use.Footnote 41 It requires regulation on how false positives and negatives are managed in the interaction between AI and human decision makers. At the time of our conference, the New York Times reported on the first AI-based arrest generated by a false positive of facial recognition software.Footnote 42 As discussed in the report, to rely solely on AI-based facial recognition software for arrests seems unacceptable given the failure rate of such systems. Legal regulation has to counterbalance the risks stemming from AI by forcing the police to corroborate AI results with additional evidence. A fundamental rights analysis of the facial recognition software should include an evaluation not only of the technology alone but also of the entire sociotechnological arrangement in the light of habeas corpus rights and the abstract dangers for the right to personal liberty that come with it. The actual cases, however, are not about some formal right to data protection but about substantive rights, such as the right to liberty or the right against racial discrimination, and the dangers AI technologies pose for these rights.
For AI technologies, the differences between the traditional approach and the suggested approach regarding the right to data protection are similar to differences in the scientific approach to, and the description of, the systems as such. Whereas traditionally the approach to, and the description of, computational systems has been very much dominated by computer sciences, there is a developing trend to approach AI systems – especially because of their lack of informational transparency – with a more holistic intradisciplinary methodology. AI systems are studied in their deployment context with behavioral methodologies which are not so much focused on the inner informational workings of the systems but on their output and their effects in a concrete environment.Footnote 43 The traditional approach tends toward a more technical, informational analysis of AI systems, which is significantly hampered by the black box phenomenon. The shift to the substantive rights perspective would lean toward a more behavioral approach to AI. The law would not have to delve into the computational intricacies of when and how what type of personal data is processed. It could take a step back and access how an AI system ‘behaves’ in the concrete sociotechnological setting it is employed in and what type of risks it generates for which substantive fundamental rights.
V. Conclusion
From a doctrinal, fundamental rights perspective, AI could have a negative and a positive implication. The negative implication pertains to the traditional conceptualization of data protection as an independent fundamental right on its own. The traditional formal model, which focuses on each and every processing of personal data as a fundamental rights infringement could be on a collision course with AI’s technological development. AI systems do not provide the kind of transparency that would be necessary to stay true to the traditional approach. The positive implication pertains to the alternative model I have been suggesting for some time. The difficulties AI may pose for the traditional conceptualization of the right to data protection could generate some wind beneath the wings of the alternative conception, which seems better equipped to handle AI’s black box challenge with its more systemic and behavioral approach. The alternative model might seem quite revisionary, but it holds the promise of redirecting data protection toward the substantive fundamental rights issues at stake – also, but not only, with respect to AI technologies.
I. Introduction
Artificial Intelligence (AI) as an area of research within the field of computer science concerns itself with the functioning of autonomous systems and, as such, not only affects almost all areas of modern life in the age of digitisation but has also – and for good reasons – become a focal point within both academic and political discourse.Footnote 1 AI scenarios are mainly driven and determined by the availability and evaluation of data. In other words, AI goes hand in hand with what may be referred to as an enormous ‘appetite for data’. Thus, the accumulation of relevant (personal or non-personal) data regularly constitutes a key factor for AI-related issues. The collected personal data may then be used to create (personality) profiles as well as to make predictions and recommendations with regard to individualised services and offers. In addition, non-personal data may be used for the analysis and maintenance of products. The applications and business models based on the collection of data are employed in both the private and public sector. The current and potential fields of application for AI are as diverse and numerous as the reactions thereto, ranging from optimism to serious concerns – oftentimes referring to a potential ‘reign of the machines’. However, there is a general consensus regarding the fact that the development and use of AI technologies will have significant impact on the state, society, and economy. For instance, the use of such applications may greatly influence the protection of personal rights and privacy, because the development of AI technologies regularly requires the collection of personal data and the processing thereof. This chapter will focus on and examine provisions concerning the handling of personal data as set out in the European Union’s General Data Protection Regulation (GDPR)Footnote 2 which entered into force on 24 May 2016 and has been applicable since 25 May 2018.
The prerequisites and applications of AI on one hand and the regulatory requirements stipulated by the GDPR on the other, give rise to a number of complicated, multi-sided tensions and conflicts. While the development of AI is highly dependent on the access to large amounts of data (i.e. big data), this access is subject to substantial limitations imposed by the data protection law regime. These restrictions mainly apply to scenarios concerning personal (instead of non-personal) data and primarily stem from the GDPR’s preventive prohibition subject to authorisationFootnote 3 and its general principles relating to the processing of personal data.Footnote 4 One of the most fundamental problems which arises in connection with big data is referred to as ‘small privacy’. This term alludes to the inherent conflict between two objectives pursued by data protection law, the comprehensive protection of privacy and personal rights on the one hand and the facilitation of an effective and competitive data economy on the other. The tension arising from this conflict is further illustrated by Article 1 GDPR, according to which the Regulation contains provisions to promote both the protection of natural persons with regard to the processing of personal data and the free movement of such data. An instrument intended to facilitate an appropriate balance between the protection of personal data and seemingly contradictory economic interests may be seen in the users’ data sovereignty.Footnote 5
At this point, it should be noted that the GDPR does not (or, if at all, only marginally) address the implications of AI for data protection law. Thus, in order to be applied to individual cases and to specific issues arising in connection with AI, the general provisions of the GDPR need to be construed. This may oftentimes lead to substantial legal uncertainties, especially when considering the vague wording, unclear exemptions, and considerable administrative discretion provided by the GDPR. The aforementioned uncertainties may not only impede innovation but may also give rise to a number of issues concerning the (legal) accountability for AI, for instance, in connection with the so-called black-box-phenomenonFootnote 6 regularly encountered when dealing with self-learning AI systems (i.e. deep or machine learning).
II. AI and Principles Relating to the Processing of Data
The development and use of AI may potentially conflict with almost all principles concerning the processing of data as enshrined in the GDPR. In fact, the paradigms of data processing in an AI-context are very difficult, if not impossible, to reconcile with the traditional principles of data protection. The complex and multi-layered legal issues resulting from this contradiction are first and foremost attributable to the fact that AI scenarios were not (sufficiently) taken into account during the drafting of the GDPR. This raises the question of whether and to what extent AI scenarios can be adequately addressed and dealt with under the existing legal regime by utilising the available technical framework and by interpreting the relevant provisions accordingly. Where the utilisation of such measures and, consequently, the application of the law and the complianceFootnote 7 with the principles of data protection is not possible, it has to be assessed whether there are any other options to adapt or to amend the existing legal framework.Footnote 8
The aforementioned data protection issues have their roots in the general principles of data protection. Hence, in order to fully comprehend the (binding) provisions that a ‘controller’ in the sense of the GDPR must observe when processing data, it is necessary to take a closer look at these principles. This is especially important considering the very prominent role of the legal framework in nearly all AI scenarios. The addressee of the principles relating to the processing of personal data laid down in Article 5(1) GDPR, is responsible for the adherence thereto and must, as required by the principle of accountability, be able to provide evidence for its compliance therewith.Footnote 9 The obligations set out in Article 5(1) GDPR range from the lawfulness, fairness, and transparency of data processing as well as the adherence to and compatibility with privileged purposes (purpose limitation) to the principle of data minimisation, accuracy, storage limitation, as well as integrity and confidentiality.Footnote 10 Beyond the scope of the present analysis in this chapter lie questions concerning conflicts of law and the lawfulness of data transfer in non-EU Member States, although these constellations are likely to become increasingly important in legal practice especially in light of the growing importance of so-called cloud-solutionsFootnote 11.
1. Transparency
In accordance with Article 5(1)(a) alt. 3 GDPR, personal data must be ‘processed […] in a transparent manner in relation to the data subject’. These transparency requirements are of particular importance for matters relating to AI. As set out in Recital 39 of the GDPR, the principle of transparency
requires that any information and communication relating to the processing of those personal data be easily accessible and easy to understand, and that clear and plain language be used. That principle concerns, in particular, information to the data subjects on the identity of the controller and the purposes of the processing and further information to ensure fair and transparent processing in respect of the natural persons concerned and their right to obtain confirmation and communication of personal data concerning them which are being processed.Footnote 12
These transparency requirements are specified by the provisions contained in Articles 12-15 GDPR, which stipulate the controllers’ obligation to provide information and to grant access to personal data. They are further accompanied by the obligation to implement appropriate technical and organisational measures.Footnote 13 Moreover, Article 12(1) sentence 1 GDPR requires the controller to provide the data subject with any information and communication ‘in a concise, transparent, intelligible and easily accessible form, using clear and plain language’. Especially with regard to issues relating to AI, the implementation of these requirements is very likely to present responsible controllers with a very complex and onerous task.
In an AI scenario, it will often be difficult to state and substantiate the specific purposes for any given data analysis in advance. Controllers may also face enormous difficulty when tasked with presenting the effects that such an analysis could have on the individual data subject in a sufficiently transparent manner. In fact, the very nature of self-learning AI which operates with unknown (or even inexplicable) variables seems to oppose any attempt to present and provide any transparent information.Footnote 14 In addition, the aforementioned ‘black-box-phenomena’ may occur if, for instance, artificial neural networks on so-called hidden layersFootnote 15 restrict or even prohibit the traceability of the respective software-processes. Thus, on the one hand, it may be difficult to break down the complex and complicated AI analyses and data collection processes into ‘concise, transparent, intelligible and easily accessible’ terms that the affected data subject can understand. On the other hand, the lack of transparency is an inherent feature and characteristic of self-learning, autonomous AI technologies.Footnote 16 Furthermore, these restrictions on transparency also come into play when considering potential justifications for the processing of data. This is particularly relevant where the justification is based on the data subject’s consent as this (also) requires an informed indication of the subject’s agreement.Footnote 17
However, according to the principles of the GDPR, even controllers who use systems of AI and, thus, carry out extensive analyses of huge amounts of data of different origins, should have the (realistic) possibility to process data in a manner which allows them to adequately inform the subject about the nature and origin of the processed data. Further difficulties are likely to arise in situations where personal data are generated in the course of analyses or as a result of combinations of originally non-personal data. Because, in this case, the legally relevant collection of data is to be found in the analysis, it is difficult if not impossible to pinpoint the data’s initial origin and source. In such constellations, it should, thus, be assumed that the responsible controller is permitted to merely provide general information, for instance by naming the source of the data stock or the systems utilised to process the data in addition to the means used for their collection. In this context, it also has to be emphasised that the obligation to inform the data subject as set out in Article 14(5)(b) GDPR may be waived if the provision of such information would be disproportionally onerous. The applicability of this waiver must be determined by balancing the controller’s efforts required for the provision of information with the data subject’s right and interest to be informed. The outcome of this (case-by-case) balancing process in big-data-situations – not only in the context of AI – will largely depend on the effects that the data analysis and processing have on the subject’s fundamental rights, as well as on the nature and degree of risks that arise in connection thereto. For the purposes of such an assessment, the principle of transparency should extend beyond the actual data processing procedures to include the underlying technical systematics and the decision-making systems employed by the (responsible) controller.
2. Automated Decisions/Right to Explanation
Article 22 GDPR is intended to protect the individual from being made subject to decisions based solely on an automated assessment and evaluation of the subject’s personal profile, because this would risk degrading the individual to a mere object of computer-assisted programs. Against this background, the GDPR imposes additional obligations to provide information in situations where the responsible controller utilises automated decision-making procedures in Articles 13(2)(f), 14(2)(g), and 15(1)(h) GDPR. Pursuant to these provisions, the controller has to provide ‘meaningful information about the logic involved’ in the data processing. This obligation may be called into questionFootnote 18 when considering the aforementioned difficulties that controllers may face when tasked with providing information about complex and potentially inexplicable (autonomous) AI processes and the results based thereon. In these scenarios, the controller should merely have to provide (and the subject should merely be entitled to) general information on the functioning of the specific AI technology, whereas a right to a substantiated explanation should be rejected. In accordance with Article 35(3)(a) GDPR, an evaluation of personal data which is based on automated processing requires a data protection impact assessment. It should also be emphasised that the use of AI as such is not restricted as of today. Instead, the restrictions apply solely to decision-making processes based on the use of AI.
3. Purpose Limitation/Change of Purpose
Pursuant to the principle of purpose limitation as set out in Article 5(1)(b) GDPR, the purposes for processing and collection of (personal) data must be specified and made available to the data subject in a concise and intelligible way.Footnote 19 This principle also applies to any further processing of data. The requirement of a pre-defined purpose limitation generally opposes the basic concept of AI, according to which AI should develop independently (or possibly within a certain pre-defined framework) and should be used for purposes not defined in advance.Footnote 20 Against this backdrop, the prescription of purpose limitations threatens to impede the (unhindered) development and potentials of AI technologies.Footnote 21 Thus, the limitation of legitimate purposes of data processing may lead to a considerable restriction of technological AI potentials.Footnote 22 In situations in which AI can (and frequently even should) lead to unforeseen and possibly unforeseeable applications and results, it can, therefore, be very challenging to find an appropriate equilibrium between the principle of purpose limitation and the innovation of AI technologies. In many AI scenarios, it is virtually impossible to predict what the algorithm will learn. Furthermore, the purpose in the sense of Article 5(1)(b) GDPR may change in the course of the (autonomous) development of self-learning AI, especially as the relevant objectives of the data processing may not be known at the time of data collection. Moreover, it is reasonable to be concerned about a distortion of the results (freely) generated by AI tools as potentially induced by data protection law, if such technologies are only granted restricted (or no) access to certain data sources. There is, thus, a notable risk of conflicts between the interests and objectives of the individual on the one hand and public welfare on the other. In order to avoid such conflicts, it is crucial to explicitly list the application and use of AI as one of the purposes for the collection of data. Data controllers should, therefore, seek to identify, document, and specify the purposes of future data processing at an early stage. Where these measures are not taken, the requirements for a permissible change of purpose follow from Article 6(4) GDPR.
Article 6(4) GDPR, which addresses purpose changes, lists a number of criteria for the evaluation of the compatibility of such changes in situations where the data processing is carried out for purposes other than the ones for which the data has been originally collected. This creates a direct link to the principle of purpose limitation as laid down in Article 5(1)(b) GDPR. It should further be emphasised that the compatibility of a change of purpose with the original purpose does not affect the cumulative prerequisites for the lawfulness of the processing in question. Because Article 6(4) GDPR itself does not constitute a legal basis for the processing of data for other purposes, recourse must be taken to Article 6(1) subpara. (1) GDPR which requires the existence of a legal justification also for other purposes. In consequence, the controller is responsible to ensure that the data processing for the new purpose is compatible with the original purpose and based upon a legal justification in the sense of Article 6(1) subpara. (1) GDPR. In many cases, relevant personal data will not have been collected for the purposes of training or applying AI technology.Footnote 23 In addition, controllers may sometimes have the hope or expectation to subsequently use the collected data for other purposes, for instance in exploratory data analyses. If one were to pursue a more restrictive line of interpretation regarding the change of purposes by applying the standard of Article 6(4) GDPR, it would be impossible to use AI with a sufficient degree of legal certainty. Especially, situations, in which data is generated in different contexts and subsequently combined or used for (new) purposes, are particularly prone to conflict.Footnote 24 In fact, this scenario demonstrates the far-reaching implications of and issues arising in connection with the principle of purpose limitation and AI scenarios: if the purpose for the processing of data cannot (yet) be determined, the assessment of its necessity becomes largely meaningless. Where the purpose limitation remains vague and unspecified, substantial effects of this limitation remain unlikely.
4. Data Minimisation/Storage Limitation
Pursuant to the principle of data minimisation,Footnote 25 personal data must be adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed. The principle of data minimisation is specified by the requirement of storage limitation (as will be elaborated in the following) and the provisions concerning data protection through the implementation of technical measures and data protection ‘by design and by default’.Footnote 26 Similarly to the principles described above, the principle of data minimisation oftentimes directly contradicts the general concept of AI technologies which is based on and requires the collection of large amounts of data.Footnote 27 Given the very nature of AI applications, it is exceedingly difficult to make any kind of prediction regarding the type and amount of data necessary in constellations which have yet to be determined by the application itself. In addition, the notion of precautionary protection of fundamental rights by way of data avoidance openly conflicts with the high demand for data in any given AI scenario.Footnote 28
The principle of storage limitationFootnote 29 prescribes that where personal data is stored, the identification of the data subject is only permissible for as long as this is necessary for the processing purposes. This principle also poses considerable difficulties in AI constellations, because the deletion or restriction of personal data after the fulfilment of their purpose can significantly impede both the development and use of AI technologies. According to Recital 39 sentences 8 and 10 of the GDPR, the period for which personal data is stored must be limited to a strict minimum. The controller should further establish time limits for the data’s erasure or their periodic review. Correspondingly, Article 17 GDPR contains the data subject’s right to demand the immediate erasure of any data concerning him or her under certain conditions.Footnote 30
5. Accuracy/Integrity and Confidentiality
Another principle of data protection law which may be affected in AI scenarios is the principle of accuracy as set out in Article 5(1)(d) GDPR. This principle is intended to ensure that the collected (personal) data accurately depicts reality so that the affected data subjects will not suffer any disadvantages resulting from the use of inaccurate data. In situations in which the procedure and systems used for the processing of data present themselves as a ‘black box’ to both data subject and controller, it can be very difficult to detect inaccurate information and to restore their accuracy.Footnote 31 However, situations concerning the accuracy of data require a distinction between data input and output; as the latter is a result of data-processing analyses and processes – also and in particular in situations involving AI – it will regularly constitute a (mere) prognosis.
Pursuant to Article 5(1)(f) GDPR, personal data must be processed in a manner that ensures their appropriate security. The controller is thereby required to take adequate measures to ensure the data’s protection against unauthorised or unlawful processing and against accidental loss, destruction, or damage.
6. Lawfulness/Fairness
The lawfulness of data processingFootnote 32 requires a legal basis authorising the processing of data as the normative concept of data protection law envisages a prohibition subject to authorisation. In order to be deemed lawful in the sense of Article 5(1)(a) GDPR, the processing must fulfil at least one of the prerequisites enumerated in Article 6(1) GDPR. In this context, Article 6(1) subpara. 1(b) GDPR permits the processing of data if it is necessary for the performance of a contract which the data subject is party to or for the implementation of pre-contractual measures. However, in scenarios involving AI, such pre-contractual constellations will not arise regularly. Similarly, AI scenarios are very unlikely to fall within the scope of any of the other authorisations listed in Article 6(1) subpara. (1) GDPR which include the existence of a legal obligation, the protection of vital interests, or the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller.Footnote 33
In contrast, the authorisations set out in Article 6(1) subpara. (1)(a) and (f) GDPR, which are based on the data subjects’ consentFootnote 34 and the balancing of interestsFootnote 35 are of great practical importance for the development and use of AI applications. Especially in connection with authorisations relying on consent, attention must be paid to the data subject’s right to withdraw his or her consentFootnote 36 and to provisions regulating the processing of special categories of personal data.Footnote 37
a. Consent
The most prominent justification for the processing of data is the subject’s consent.Footnote 38 The requirements for consent can be derived from a conjunction of the provisions stipulated in Article 4(11), Article 6(1) subpara. 1(a), Article 7, and Article 8 GDPR as well as from the general principles of data protection law. The processing of data is only lawful to the extent that consent has been given, meaning that the data subject must give his or her consent for one or more specific purpose(s).Footnote 39 Thus, the scope of the justification is determined by the extent of consent. It should also be pointed out that abstract purposes such as ‘advertisement’ or ‘IT-security’ are insufficient.Footnote 40 This will also apply in the context of AI. Furthermore, Article 4(11) defines consent as the ‘freely given’ and ‘informed’ indication of the subject’s declaration of intent. The requirement of an ‘informed’ decision corresponds directly with the previously elaborated principle of transparencyFootnote 41 which is also laid down in Article 7(2) GDPR. In an AI scenario, this requirement gives rise to further tension between the controllers’ obligation to provide adequate information on the one hand and the information’s comprehensibility for the average data subject on the other.
The requirement of ‘specific’ and ‘informed’ consent may also pose significant challenges where the controller neither knows nor is able to foresee how and for which purposes the personal data will be processed by self-learning and autonomous AI systems. In principle, the practicability of a consent-based justification may be called into question, particularly when considering the voluntary element of such consent in situations lacking any viable alternatives or scenarios of market dominance. In this regard, it may be said that the requirements of a justification based on consent are more fictional than practicable, especially in view of the ubiquity of data-related consent agreements: ‘no one has ever read a privacy notice who was not paid to do so.’Footnote 42
b. Withdrawal of Consent
In addition to the fulfilment of the requirements for a consent-based justification, the technical and legal implementation of the withdrawal of consent as set out in Article 7(3) sentence 1 GDPR is also highly problematic. According to this provision, the data subject has the right to withdraw his or her consent at any time and without having to adhere to any formal requirements. After the consent has been effectively withdrawn, the justification for the processing of data in the sense of Article 6(1) subpara. 1(a) GDPR ceases to exist. In consequence, any further processing of data will only be lawful if, as a substitute, another ground for justification were to apply.Footnote 43 Furthermore, a distinction must be made between the right to withdraw in the aforementioned sense, the right to object to unconsented processing of data as regulated by Article 21 GDPR, and, finally, a generally permissible time limitation. As a consequence of the withdrawal of consent, the controller is required to erase the relevant personal data. In cases involving the use of AI, especially scenarios in which certain data is used to train an AI application, it is doubtful whether (and if so, to what extent) the imposition of an obligation to delete is even practicable.Footnote 44
c. Balancing of Interests
The justification based on a balancing of interests allows the processing of personal data in cases where there cumulatively exists (i) a legitimate interest pursued by the controller or by a third party and, (ii) where the processing is necessary to safeguard these legitimate interests, and (iii) where these interests are not overridden by the interests or fundamental rights and freedoms of the data subject who requires the protection of his or her data. The vague wording of this provision is likely to give rise to complications, which do not only apply in the context of AI. For instance, the GDPR does not provide any specific points of reference regarding the general admissibility of and the specific requirements for the processing of data in connection with the balancing of interests within the meaning of Article 6(1) subpara. 1(f) GDPR.
Thus, the task to specify the requirements of the abovementioned balancing process is mostly assigned to academic discourse, courts, and public authorities. However, such an interpretation of the GDPR must, in any case, comply with and adhere to the objective of a consistent standard of (data) protection throughout the EU.Footnote 45 It is, therefore, subject to the requirement of a harmonised interpretation of the law which, in turn, is intended to guarantee equal data processing conditions for all market participants in the EU.Footnote 46 In addition, by establishing codes of conduct designed to contribute to the appropriate application of the GDPR, Member States are encouraged to provide legal certainty by stating which (industry-specific) interests can be classified as legitimate in the sense of Article 6(1) GDPR. Finally, the European Data Protection Board may, pursuant to Article 70(1)(e) GDPR, further ensure the consistent application of the Regulation’s provisions by issuing guidelines, recommendations, and best practices, particularly regarding the practical implementation of the aforementioned balancing process.
d. Special Categories of Personal Data
Article 9 GDPR establishes a separate regulatory regime for special categories of personal data and prohibits the processing of these types of data. These include, for instance, genetic and biometric data, or data concerning health, unless their processing falls under one of the exemptions listed in Article 9(2) GDPR. In accordance with Article 22(4) GDPR, automated decisions, including profiling, must not be based on sensitive data unless these exemptions apply. Furthermore, the processing of large amounts of sensitive data, as referred to in Article 35(3)(b) GDPR, requires an obligatory data protection impact assessment. Overall, the use and application of AI impose new challenges for the protection of sensitive data. The accumulation of personal data in conjunction with improved methods of analysis and (re-)combination will certainly increase the likelihood of cases affecting potentially sensitive data within the meaning of Article 9 and Recital 51 of the GDPR. Consequently, an increasing amount of data may fall under the prohibition of Article 9(1) GDPR. It is, therefore, necessary to closely follow new trends and developments in the technical field, including but not limited to AI, in order to correctly determine the scope of application of Article 9 GDPR. These findings leave controllers with considerable (legal) uncertainties regarding their obligations.
In light of the new possibilities for a fast and effective AI-based evaluation of increasingly large amounts of data (i.e. big data), the question arises whether metadata, source data, or any other types of information which, by themselves, generally do not allow the average observer to draw any conclusions as to the categories mentioned in Article 9(1) GDPR, nevertheless fall under this provision. If so, one may consider adding the application of AI technology to the list of potential exemptions under Article 9(2) GDPR. In this context, however, regard must be paid to the principle of purpose limitation as previously mentioned.
7. Intermediate Conclusion
Given its rather broad, oftentimes undefined and vague legal terminology, the GDPR, in many respects, allows for a flexible application of the law. However, this flexibility goes hand in hand with various (legal) uncertainties. These uncertainties are further perpetuated by the GDPR’s notable and worrisome lack of reference to and regulation of AI-specific constellations. As shown above, these constellations are particularly prone to come into conflict with the general principles of data protection as set out in Article 5(1) GDPR and as specified and reiterated in a number of other provisions. In this context, the principles of data minimisation and storage limitation are particularly problematic. Other conflicts, especially involving the GDPR’s principles of purpose limitation and transparency, may arise when considering the rather complex and ambiguous purposes and structures for the processing of data as well as the open-ended explorative analyses frequently observed in AI-scenarios. This particularly applies to subsequent changes of purpose.Footnote 47 It must also be emphasised that the requirement of transparency serves as a regulatory instrument to ensure the lawfulness of data processing and to detect tendencies of dominanceFootnote 48 or, rather, the abuse thereof. However, legal uncertainties entail considerable risks and burdens for controllers implementing AI technologies which are amplified and intensified by the GDPR’s new and much stricter sanctions regime.Footnote 49 Finally, it has to be pointed out that these conflicts by no means only apply to known concerns of data protection law, but rather constitute the starting point for new fundamental questions in this field.
III. Compliance Strategies (de lege lata)
Based on these findings, it is necessary to examine potential strategies to comply with the provisions of the GDPR and to establish a workable and resilient framework which is capable of fostering the future development and application of AI technologies under the given legal framework. It should also be emphasised that the enactment of the GDPR has fundamentally increased the requirements for compliance with data protection law. This development was further accompanied by substantially higher sanctions for the infringement of data protection law.Footnote 50 In addition to potential sanctions, any infringement of data protection law may also give rise to private damage claims pursuant to Article 82 GDPR which cover both material and non-material damage suffered by the data subject. The legally compliant implementation of AI may further be impeded by the interplay and collision of different or conflicting data protection guarantees. Such guarantees can, for instance, be based on data protection law itself, on other personal rights, or on economic and public interests and objectives. In an AI context, this will become particularly relevant in connection with the balancing of interests required by Article 6(1) subpara. 1(f) GDPR.
Article 25 GDPR contains the decisive normative starting point for data protection compliance, in other words the requirement that data protection-friendly technical designs and default settings must be used. However, the rather vague wording of this provision (again) calls for an interpretation as well as specification of its content. The obligation of the responsible controller to implement appropriate technical and organisational measures is essential in terms of data protection compliance. Overall, the GDPR pursues a risk-based approach.Footnote 51 From a technical and organisational point of view, it is, thus, necessary to ask how the protection of personal data can be achieved by way of a data protection management system and other measures, for instance through anonymisation and pseudonymisation. The starting point of these considerations is the connection between the data in question and an individual (personal reference), which is decisive for the opening of the substantive scope of application of the GDPR.Footnote 52
1. Personal Reference
The existence of such a personal reference is a necessary prerequisite for the application of the GDPR. From a factual point of view, as set out by Article 2(1) GDPR, the GDPR applies in cases of a ‘wholly or partially automated processing of personal data and for non-automated processing of personal data stored or to be stored in a file system’. Therefore, it must be asked whether, in a given case and under specific circumstances, personal data is being processed.Footnote 53 According to the legal definition stipulated in Article 4(1) GDPR, personal data is
any information relating to an identified or identifiable natural person […]; an identifiable natural person is one who can be identified directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or one or more specific characteristics expressing the physical, physiological, genetic, psychological, economic, cultural or social identity of that natural person.
According to the pertinent case law of the European Court of Justice (ECJ), it is sufficient for the responsible data controller to have legal means at his or her disposal to make the data of the third party available (so-called absolute personal reference); this can also encompass detours via state authorities.Footnote 54
The application of the GDPR – and thus the application of its strict regulatory regime – could be avoided by way of, for instance, the data’s anonymisation. Article 3(2) of Regulation No. 2018/1807 concerning the free movement of non-personal data states that, in the event of personal and non-personal dataFootnote 55 being inseparable, both sets of rules (regarding personal and non-personal data) must, in principle, be applied. However, in many cases, it will not be easy to determine with any (legal) certainty whether and to what extent data records may also contain personal data. Hence, in order to remain on the ‘safe side’ regarding the compliance with the current data protection regime, controllers may feel the need to always (also) adhere to the provisions and requirements of the GDPR even in cases where its application may be unnecessary. This approach may result in considerable (and needless) expenditures in terms of personnel, material, and financial resources.
a. Anonymisation
In contrast to personal information in the aforementioned sense, the GDPR does not apply to anonymous information because they are, by their nature, the very opposite of personal. Recital 26 of the GDPR states: ‘The data protection principles should therefore not apply to anonymous information, i.e. information which does not relate to an identified or identifiable natural person, or personal data which has been anonymised in such a way that the data subject cannot or can no longer be identified.’
The Regulation, therefore, does not address the processing of such anonymous data, including data for statistical or research purposes. It follows from the aforementioned Recital that, when it comes to the identifiability of an individual person, the technological capabilities and developments available at the time of the processing must be taken into account. However, when it comes to the technical specifications with regard to the actual anonymisation process, the GDPR, with good reason, does not stipulate a specific procedure to follow. This lack of a prescribed procedure not least benefits innovation and development of new technologies and the concept of technological neutrality.Footnote 56 The relevant time of evaluation is always the time of the processing in question.
This is further not changed by mere reference to the fact that almost all anonymised data may be restored by means of advanced sample formations, because such an objection is far too broad and, thus, certainly falls short of the mark.Footnote 57 Nevertheless, it should also be noted that, with respect to data relating to location, an anonymisation is considered virtually impossible. Thus, Article 9 GDPR bears particular significance when it comes to the inclusion of location data in the relevant applications. In any case, the issue of de-anonymisation, for which especially the available data stock, background knowledge, and specific evaluation purposes have to be considered, remains highly problematic. According to Recital 26 of the GDPR, in order to identify means likely to be used for the identification of an individual, all objective factors such as costs, time, available technologies, and technological developments, should be considered. In this context, the continuously more advanced big data analysis techniques tend to lead to an ever further reaching re-identification of persons in a constantly growing data pool. In addition, the change of the underlying technological framework and the conditions thereof may (over time) result in the ‘erosion’ of the former anonymisation and subsequently uncover or expose a personal reference of the respective data. Naturally, the consequential (legal) uncertainties may pose a considerable risk and problem to users and other affected parties, especially with respect to issues of practical manageability and incentive. In this context, the opinion on Anonymisation Techniques of the Article 29 Data Protection Working Party (particularly relating to the robustness of randomisation and generalisation) may give helpful indications, but will certainly not be the solution to all potential issues arising in connection thereto.Footnote 58 Thus, the findings and developments mentioned earlier give rise to well-founded doubts as to whether the comprehensive anonymisation of data can be successfully achieved under the current framework conditions (e.g. technological progress and available data volumes).
b. Pseudonymisation
According to the legal definition provided in Article 4(5) GDPR, pseudonymisation means the processing of personal data in such a way that personal data cannot (any longer) be assigned to a specific data subject without the use of additional information. Although the GDPR does not expressly permit or privilege the processing of personal data in the event of a pseudonymisation, there are a number of substantial incentives to carry out such a pseudonymisation: in the case of a pseudonymisation, the balancing of interests within the meaning of Article 6(1) subpara. 1(f) GDPR is more likely to sway in favour of the processor. Furthermore, in the case of data protection violations pursuant to Article 34(3)(a) GDPR, the obligation to notify the data subject does not apply in cases of encryption as a sub-category of pseudonymisation. In addition, the procedure may decrease the need for further technical and organisational protection and may, in the event of a previously mentioned change of purpose, be included as a factor in the balancing process as required by Article 6(4)(e) GDPR. Pseudonymisation, therefore, has the potential to withdraw the processing of certain data from the scope of the GDPR and to avoid the application of the Regulation’s strict requirements.
c. Synthetic Data
Another possibility to avoid a personal reference and, thus, the application of the GDPR is the production and use of synthesised data which constitute a mere virtual representation of the original set of data. The legal classification of synthetic data is directly linked to the existence or producibility of a personal reference. As a result, the lack of a personal reference allows synthetic data to be equated to anonymous data. In this context and in connection with all related questions, the decisive issue is, again, the possibility of a re-identification of the data subject(s). Another potential problem that must be taken into account relates to eventual repercussions on the data subjects of the (underlying) original data set from which the synthetic data were generated. For instance, processing operations subject to the provisions of the GDPR may hereby arise due to the predictability of sensitive characteristics resulting from a combination of multiple data sets.
2. Responsibility
The question of who is responsible for the compliance with the requirements of data protection and to whom data subjects can turn in order to exercise their rights is of great importance.Footnote 59 Article 4(7) GDPR defines the data controller as ‘the natural or legal person, public authority, agency or other body which alone or jointly with others determines the purposes and means of the processing of personal data’. In practice, an essential (distinguishing) characteristic of a data controller within the sense of the GDPR is, thus, the authority to make a decision about the purposes and means of data processing. In a number of recent rulings, the ECJ has further elaborated the criterion of responsibility by specifying the nature and extent of a controller’s decision concerning the purpose and means of processing personal data: Facebook Fanpage,Footnote 60 Jehovah´s Witnesses,Footnote 61 and Fashion ID/Facebook Like Button.Footnote 62 According to the (previous) case law of the ECJ, those involved in the processing of data do not necessarily have to bear an equal amount of responsibility. Instead, the criterion of responsibility is met if the participants engage in the data processing at different stages and to varying extents, provided that each participant pursues its own purposes for the processing.
3. Privacy by Default/Privacy by Design
Article 25 GDPR contains provisions concerning the protection of data by way of (technology) design and data protection-friendly default settings.Footnote 63 The first paragraph of the provision stipulates the principles for privacy by design, that is, the obligation to design technology in a manner that facilitates and enables effective data protection (in particular to safeguard the implementation of data-protection principles such as data minimisation). In its scope, the provision is limited to an enumeration of various criteria to be taken into account by the controller with regards to the determination of appropriate measures and their respective durations. The provision does not further specify any concrete measures to be taken by the responsible controller – with the exception of pseudonymisation as discussed earlier. In addition, Article 25(2) GDPR sets out the principle of privacy by default, in other words, the controller’s obligation to select data protection-friendly default settings to ensure that only data required for the specific purpose are processed. Finally, in order to demonstrate compliance with the requirements of Article 25(1) and (2) GDPR, Article 25(3) allows the use of an approved certification procedure in accordance with Article 42 GDPR. The challenges previously described typically arise in cases where the GDPR’s transparency requirements coincide with complex AI issues, which – by themselves – already present difficulties for the parties concerned. Against this background, certification procedures, data protection seals, and test marks in the sense of Article 42 GDPR could represent valuable instruments on the way to data protection compliance.
4. Data Protection Impact Assessment
The data protection impact assessment pursuant to Article 35 GDPR addresses particularly high-risk data processing operations with regard to the rights and freedoms of natural persons. The provision requires the controller to carry out a preventive review of the potential consequences of any processing operations likely to result in such a high risk and to subsequently select and implement the appropriate risk-minimising remedial measures. The obligation to carry out a data protection impact assessment serves the purpose to ensure the protection of personal data and, thus, the compliance with the provisions of the Regulation.Footnote 64 At the same time, Articles 35(4) and (5) require the responsible supervisory authority to establish and make public a list of the kind of the specific processing operations which require such an impact assessmentFootnote 65 and of those operations which do not require an assessment.Footnote 66 These lists are intended to ensure legal certainty for and equal treatment of responsible controllers and to facilitate transparency of all parties concerned. By including processing operations, for which a data protection impact assessment must be carried out, in a list,Footnote 67 the supervisory authority can also positively establish an obligation to carry out a data protection impact assessment.
Furthermore, it should be noted that Article 35(1) GDPR explicitly requires the conduction of a data protection impact assessment ‘in particular’ where ‘new technologies’ are used. Naturally, this provision is of particular relevance in cases where large amounts of data are processed using ‘new’ AI systems and technologies and might indicate that the use of such applications may automatically trigger the need for a comprehensive and onerous impact assessment. The GDPR does not explicitly provide any examples of technologies or areas of technology which qualify as ‘new’. However, a new technology is likely to pose a high risk to the rights and freedoms of natural persons if it enables the execution of large-scale processing operations which allow the processing of large quantities of personal data at a regional, national, or supranational level and which may involve data relating to a large number of individuals and data of a particularly sensitive nature. Thus, developments such as Smart Car, Smart Health, Big Data, and Tracking procedures as well as new security and monitoring technologies are likely to fall within the scope of Article 35(1) GDPR, hence requiring controllers using and offering such technologies to conduct a data protection impact assessment in accordance with Article 35 GDPR.Footnote 68 In the context of AI systems, it remains highly doubtful whether such an obligation would even be feasible, given the fact that self-learning programs develop continuously and more or less unpredictably.
5. Self-Regulation
In order to specify, construe, and interpret the large number of indeterminate legal provisions of the GDPR, it is also necessary to give consideration to the elements of self-regulation or co-regulation.Footnote 69 Article 40 GDPR gives associations and other bodies the possibility to draw up, amend, or extend rules of conduct which clarify the application of the GDPR. Thus, pertinent rules of conduct can be developed (e.g. by means of best practices) which can subsequently be approved by the responsible supervisory authoritiesFootnote 70 or given general binding force by the EU Commission. In addition, certification procedures, data protection seals, and test marksFootnote 71 could also serve as valuable instruments when it comes to the compliance with data protection law.
On the basis of Article 42 GDPR, data controllers may also voluntarily seek certification of their data processing operations by the responsible supervisory authority or an accredited body within the meaning of Article 43 GDPR. Recital 100 of the GDPR emphasises that the associated certification procedures, data protection seals, and marks are intended to increase transparency and improve compliance with the GDPR’s requirements.
IV. Legal Policy Perspectives (de lege ferenda)
In view of the earlier points, it becomes evident that the use and implementation of AI-based technologies necessitates a thorough review of the current data protection framework. Such a review may indicate the need for the modification, amendment, or development of the GDPR’s current regime. From a legal policy perspective, legislative initiatives should hereby be the main point of focus.
1. Substantive-Law Considerations
From a substantive-law point of view, one key element of the GDPR that merits a closer examination is the personal reference as a prerequisite for the GDPR’s application. This is not least due to the structural narrowness of the personal reference in its current definition as well as its frequent lack of adequate relevance. Presently, the personal reference as a connecting criterion only insufficiently reflects the existing multiple rationalities of data processing constellations and lacks the capability to take into account the specific characteristics of each case-by-case context. In fact, the one-size-fits-all-approach of the GDPR does not appropriately distinguish between different risk situations, which means that – due to the ubiquitous relevance of personal data – there exists the risk of an excessive application of the law. Among others, this certainly applies to issues relating to the use of AI as presently discussed. With this in mind, it is both necessary and important to create sector-specific regulations for AI constellations, for instance regarding the permissibility of data processing and the specific requirements thereof. Furthermore, the ubiquity of data processing operations in the present age of digitalisation frequently calls into question the general concept of data protection in its current state. It is, therefore, necessary to (at least partially) move away from the current approach, in other words, the prohibition subject to permission in favour of a more general clause. Such a provision should differentiate between different data protection requirements according to specific risks that specific situations are likely to pose. Such a stringent risk-based approach would have the advantage of facilitating the weighing and balancing of the interests of all affected parties as well as appropriately taking into account their respective purposes for protection. In addition, the readjustment of the objectives that the GDPR serves to protect may help to realise an adequate protection of an individual’s personality and privacy rights whilst also incentivising the development and use of AI applications. In this context, the overarching objective should always be to reassess the balance of interests pursued by data subjects, responsible processors, third parties, and the public welfare in general.
Another issue that ought to be addressed relates to the granting of access to data and the corresponding rights of usage. This further encompasses questions as to the law of obligations in a data law context, data ownership, and data economics. Finally, due consideration should be given to whether the existing legal framework should be supplemented by specific provisions governing the use of AI. These provisions should not least be capable of overcoming the currently existing tensions resulting from the bi-dimensional, two-person relationship between controller and data subject. This could necessitate an amendment of data protection law with regard to AI in order to move away from an approach based solely on the individual and to appropriately take into account the challenges that may arise in connection with the quantity, heterogeneity, inter-connectivity, and dynamism of the data involved. Such an amendment should be accompanied by more systematic protective measures. A valuable contribution could hereby be made by technical design and standardisation requirements. In addition, all of these measures must be safeguarded and supported by way of an adequate and effective supervisory and judicial protection.
2. Conflicts between Data Protection Jurisdictions
Due to the cross-border ubiquity of data (processing) and the outstanding importance of AI-related issues, efforts must be made to achieve a higher degree of legal harmonisation. Ideally, such a development could result in the establishment of an overarching supra- or transnational legal framework, containing an independent regulatory regime suited to the characteristics of AI. Such a regime would also have to take into account the challenges resulting from the interplay of multi-level legal systems as well as the conflicts arising between different data protection legal regimes. For instance, conflicts may arise when the harm-based approach of US data protection law, which is focused on effects and impairments, the Chinese system, which allows for far-reaching data processing and surveillance (e.g. a Social Credit System), and the GDPR approach, which is based on a preventive prohibition subject to permission, collide. Assuming that a worldwide harmonisation of the law is hardly a realistic option in the foreseeable future, it is important to aim for an appropriate balance within one’s ‘own’ data (protection) regime.
3. Private Power
In connection with the transnationalisation of the legal framework for data protection and the conflicts between different regulatory regimes, regard must also be paid to the influence exerted by increasingly powerful private (market) players. This, naturally, raises questions as to the appropriate treatment and, potentially, the adequate containment of private power, the latter of which stems from considerations regarding the prevention of a concentration of power and the sanctioning of the abuse of a dominant market position. However, the GDPR itself does not directly stipulate any specific protective measures governing the containment of private power. Legal instruments capable of addressing the aforementioned issues must, therefore, be found outside of the data protection law body. For this purpose, recourse is frequently taken to the (unional or national) competition law, because it expressly governs questions relating to the abuse of market power by private undertakings and, in addition, provides a reliable system and regulatory framework to address such issues. In this regard, the German Federal Cartel Office (FCO) served as a pioneer when it initiated proceedings against Facebook for the alleged abuse of a dominant market position through the use of general terms and conditions contrary to data protection law, specifically the merging of user data from various sources.Footnote 72 In any case, the role and power of private individuals as an influential force in the field of data protection should certainly not be underestimated. In fact, by establishing new technological standards and, thereby, elevating their processing paradigms and business models to a de facto legal power, they have the potential to act as substitute legislators.
V. Summary and Outlook
There is an inherent conflict of objectives between the maximisation of data protection and the necessity to make use of (large quantities of) data, which transcends the realms of AI-related constellations. On the one hand, the availability and usability of personal data bears considerable potential for innovation. On the other hand, the possibilities and limitations of data processing for the development and use of AI are (above all) determined by the requirements of the GDPR. In consequence, the permissibility of the processing of personal data must be assessed in accordance and adherence with the powers to collect, store, and process data as granted by the GDPR. The law of data protection thereby imposes strict limits on the processing of personal data without justification or sufficient information of the data subject. These limitations have a particular bearing on issues relating to AI, as it is frequently impossible to make a comprehensive ex ante determination of the scope of the processing operations conducted by a self-learning, autonomous system. This is not least due to the fact that such systems may only gain new information and possibilities for application – potentially relating to special categories of personal data – after the processing operation has already started. In addition, the processing of such large amounts of personal data is oftentimes likely to result in a significant interference with the data subjects’ fundamental rights. All of these considerations certainly give rise to doubts as to whether a complete anonymisation of data is even a viable possibility under the given framework conditions (i.e. technological progress and available data volumes).
In order to combat these shortcomings of the current data protection framework, the establishment of a separate legal basis governing the permissibility of processing operations using AI-based applications should be considered. Such a separate provision would have to be designed in a predictable, sector-specific manner and would need to adhere to the principle of reasonableness, thus also ensuring the adequate protection of fundamental rights and the rule of law. The GDPR’s de lege lata approach to the processing of personal data, in other words, the comprehensive prohibition subject to permission leaves controllers – as previously elaborated – in a state of considerable legal uncertainty. As of now, controllers are left with no choice but to seek the users’ consent (whereby the requirements of informing the data subject and the need for their voluntary agreement apply restrictively) and/or to balance the interests involved on a case-by-case basis. These input limits not only burden controllers immensely, but are also likely to ultimately limit output significantly, especially in an AI context. In fact, the main principles of the applicable data protection law, (i.e. the principles of transparency, limitation, reservation of permission, and purpose limitation), appear to be in direct conflict with the functioning and underlying mechanisms of AI applications which were, evidently, not considered during the drafting of the GDPR’s legal regime. In practice, this is especially problematic considering that the GDPR has significantly increased the sanctions imposed for violations of data protection law.
Multidimensional border dissolutions occur and do mainly affect the levels of technology and law, territories, and protection dimensions: on the one hand, these border dissolutions may promote innovation, but at the same time they threaten to erode the structures of efficient law enforcement. The previously mentioned tensions between the GDPR and the basic concepts underlying AI also raise the fundamental question of whether traditional data protection principles in the age of digitalisation, especially with regard to AI, Big Data, the Internet of Things, social media, blockchain, and other applications, are in need of a review. Among others, the instrument of consent as a justification in AI constellations, which are typically characterised by unpredictability, and limited explainability, must be called into question. In any case, the legal tools for the protection of privacy need to be readjusted in the context of AI. This also and especially applies to the data protection law regime. Against this background, legislative options for action at national, unional, and international level should be examined. In this context, the protection of legal interests through technology design will be just as important as interdisciplinary cooperation and research.
Overall, (legal) data policy is a central industrial policy challenge that needs to be addressed – not only for AI constellations. Legal uncertainties may cause strategies of evasion and circumvention, which in turn (can) trigger locational disadvantages and enforcement deficits, bureaucratic burdens, and erosion with respect to legal compliance. Thus, AI-specific readjustments of data protection law should – where necessary – prevent imminent disadvantages in terms of location and competition and ensure that technology and law are open to innovation and development. Both new approaches to the interaction between data protection law and AI should be examined and existing frameworks retained (and, where appropriate, further developed). By these means, a modern data and information usage right may be established which does not result in a ‘technology restriction right’ but rather gives rise to new development opportunities. The legal questions raised and addressed in this article concern not only isolated technical issues but also the social and economic order, social and individual life, research, and science. In this sense, the existing legal framework (the European approach) should be further enhanced/developed to make it an attractive alternative to the approaches taken in the US and China, while the current model of individual protection should be maintained, distinguishing it from the other data protection regimes. With the ongoing GDPR evaluation, it is an opportune time for such an initiative. However, such an initiative requires the cooperation of all actors (users and developers, data protection authorities and bodies, policy and legislation, science, and civil society) in order to reconcile data protection with the openness of technology and law for necessary developments.
I. Introduction
COVID-19 is reshaping history with its unprecedented contagiousness. The epidemic swept the whole world throughout 2020 and beyond. In the case of South Korea (hereafter Korea), the first confirmed case of COVID-19 was reported on 20 January 2020.Footnote 1 During the initial phase after the first reported case, the Korean government hesitated to introduce compulsory quarantine for travelers from high-risk countries.Footnote 2 It put Korea on a different trajectory compared to other countries which imposed aggressive measures including immigration quarantine from the beginning.Footnote 3 The number of confirmed infections increased significantly in a short span of time and, by the end of February 2020, the nation was witnessing an outbreak that was threatening to spiral out of control. Korea appeared to be on the way to becoming the next ‘COVID-19 hotspot’ after China.Footnote 4 Confronting an increasing number of cases of COVID-19, Korea had to weigh among various options for Non-Pharmaceutical Interventions (NPIs). Korea did not take extreme measures such as shelter-in-home and complete lockdowns. Instead, it employed a series of relatively mild measures, including a social distancing order that imposed restrictions on public gatherings and on operating businesses, set at different levels in accordance with the seriousness of the epidemic.Footnote 5 A differentiated measure that Korea took was an aggressive contact tracing scheme, which served a complementary role to social distancing.
Adopting an effective contact tracing strategy requires, as a pre-requisite, a lawful and technically feasible capability to collect and process relevant personal data including geolocation data. Doing so was possible in Korea because it had already introduced a legal framework for technology-based contact tracing after its bruising encounter with the Middle East Respiratory Syndrome (MERS) in 2015. Based on its previous experience with MERS and the legislative measures and mandates adopted in the course of the MERS outbreak, Korea was well equipped to respond to COVID-19 by swiftly mounting aggressive contact tracing and other data processing schemes when COVID-19 materialised as a significant threat to the public health of its citizens. Thus, the nation’s technological infrastructure was mobilized to provide support for epidemiological investigations. The contact tracing scheme, along with a sufficient supply of test kits (such as PCR [polymerase chain reaction] kits for real-time testing) and of personal protective equipment (such as respirators), was perhaps a key contributing factor to Korea’s initial success in flattening the curve of infections and deaths, when it had to confront two major outbreaks that occurred around March and August 2020, respectively. Toward the end of 2020, Korea began facing a new round of difficulties in dealing with a third outbreak, and it again actively implemented a contact tracing scheme. As of 00:00, 1 September 2021, the accumulated number of confirmed cases was recorded at 253,445 (0.49% of the total population), including 2,292 total deaths.Footnote 6 Figure 18.1 shows the trend of newly confirmed cases.
While the statutory framework introduced after the MERS outbreak provided the necessary means to launch a technology-based response to COVID-19, new challenges arose in the process. In particular, there was an obvious but challenging need to protect the privacy of those infected and of those who were deemed to have been in close contact while, at the same time, maintaining the effectiveness of the responses. This chapter provides an overview of how Korea harnessed the power of technology to confront COVID-19 and discusses some of the issues related to the governance of data and technology that were raised during Korea’s experiences.
This chapter is organized as follows: Section II provides an overview of the legal framework which enabled an extensive use of the technology-based contact tracing scheme; Section III explains the structure of the information system that Korea set up and implemented in response to COVID-19; Section IV details the actual use of data for implementing the legal scheme and relevant privacy controversies; Section V further discusses data governance and trust issues; and, finally, Section VI concludes.
II. Legal Frameworks Enabling Extensive Use of Technology‑Based Contact Tracing
1. Consent Principle under Data Protection Laws
A major hurdle in implementing the pandemic-triggered contact tracing scheme in Korea was the country’s stringent data protection regime. Major pillars of the legal regime include the Personal Information Protection Act (PIPA),Footnote 7 the Act on Protection and Use of Location Information (LIA),Footnote 8 and the Communications Secrecy Protection Act (CSPA).Footnote 9 As a means to guarantee the constitutional right to privacy and the right to self-control of personal data, these laws require prior consent from the data subject or a court warrant prior to the collection and processing of personal data, including geolocation data and communications records. Arguably, the consent principle of the Korean law is largely modeled after what can be found in the European Union’s (EU’s) privacy regime including the General Data Protection Regulation (GDPR). However, Korea’s data protection laws tend to be more stringent than the EU’s, for instance, by requiring formalities such as the notification of mandatory items when obtaining consent. Certain statutory features of the Korean data protection laws on data collection are as follows.
First, the PIPA is the primary law governing data protection. Under the PIPA, the data subject must, before giving consent to collection, be given notice including the following: (i) the purpose of collection and use, (ii) the items of data collected, (iii) retention and use period, and (iv) (unless data is collected online) the data subject’s right to refuse consent and disadvantages, if any, from the refusal.Footnote 10 The data subject must, before giving consent to disclosure, be given notice of the recipient and similar items as above.Footnote 11 A recent amendment to the PIPA which took place in 2020 allows exceptions to the purpose limitation principle within the scope reasonably related to the purpose for which the personal data is initially collected.Footnote 12 The 2020 amendment of the PIPA also grants an exemption to the consent requirement when the processing of pseudonymized personal data is carried out for statistical, scientific research, or archiving purposes.Footnote 13 However, these built-in exceptions are not broad enough to cover the processing of personal data for the centralized contact tracing scheme.
Second, the LIA is a special law that governs the processing of geolocation data such as GPS (global positioning system) data and cell ID. This type of data is usually collected by mobile carriers or mobile operating system operators and is shared with mobile app developers. Under the LIA, a data subject of geolocation data must be given appropriate notice in the standard forms before giving consent to the collection, use, or disclosure of personal geolocation data.Footnote 14
Third, the CSPA governs when and how courts or law enforcers can request communications records including base station data or IP (internet protocol) addresses from carriers or online service providers.Footnote 15 Under the CSPA, law enforcers can request data concerning a specific base station (the base station close to the location where the mobile phone user at issue made calls) from mobile carriers in order to deter crime, to detect or detain suspects, or to collect or preserve evidence.Footnote 16 Doing so is, however, permitted only when other alternatives would not work. This provision reflects the reasoning of a constitutional case of 2018. In this case, the Constitutional Court of Korea held that a prosecutor’s collection of the identities of mobile subscribers that accessed a single base station infringed the constitutional right to self-control of personal data and the freedom of communications and that doing so is thus unconstitutional.Footnote 17
However, the previous MERS outbreak had shown the need for putting in place an effective contact tracing scheme when needed. This prompted an amendment of the Contagious Disease Prevention and Control Act (CDPCA)Footnote 18 so as to override the consent requirements under Korean data protection law in the event of an outbreak. There already is a provision in the PIPA, which exempts the application of the consent and other statutory requirement for temporary processing of personal data when there is an emergency need for public safety and security including public health.Footnote 19 The amendment of the CDPCA gave more concrete legal authority for implementing a contact tracing scheme during an outbreak of a contagious disease. After the onset of COVID-19, the Korean legislature further amended the CDPCA several times in order to better cope with the situations that had not been anticipated prior to the outbreak of COVID-19.
2. Legal Basis for Centralized Contact Tracing
For manual contact tracing by epidemiological investigators, interviews play a crucial role. Conducting interviews obviously takes time and sometimes accuracy could become an issue. As such, manual contact tracing has limitations in terms of the timely detection and quarantine of those suspected of being infected. Efforts were made in many parts of the world in order to make up for these limitations and several automated contact tracing models have been devised. Most of the newly devised models rely on geolocation data, typically gathered through smart phones. Each of these models has its own advantages and disadvantages as discussed below.
Depending on the provenance of the relevant data, these models can be divided into centralized models and decentralized models. There can also be a hybrid model. Among different types of automated contact tracing models, a majority of developed countries appear to have chosen decentralized ‘privacy-preserving’ proximity tracing models. These typically relay geolocation data utilizing the Bluetooth Low Energy technology. By design, these models grant data subjects the right to avoid tracking by not downloading or activating mobile apps. Soon after early efforts were made in order to develop and deploy a contact tracing model in the EU, the European Data Protection Board (EDPB) issued guidelines dated 21 April 2020. According to the EDPB guidelines, COVID-19 tracing apps would have to be based on the use of proximity data instead of geolocation data.Footnote 20
For the decentralized approach, there are two subtypes: a fully decentralized approach and a partially decentralized approach. A fully decentralized approach works as follows. Through the operation of a mobile app, (i) smart phones exchange ephemeral IDs of individuals nearby via Bluetooth Low Energy (‘Bluetooth Handshakes’); (ii) those individuals who are subsequently confirmed positive send their ephemeral IDs to a database in the server; and (iii) each app continues to download the database from the server and alerts if its owner has been in close proximity to one of those who are tested positive.Footnote 21 Apple-Google’s Exposure Notification (AGEN) scheme is a well-known case of the decentralized approach.Footnote 22 AGEN has reportedly been embedded in the majority of European COVID-19 apps, including Austria’s Stopp Corona, Germany’s Corona-Warn-App, Italy’s Immuni, Estonia’s HOIA, the UK’s NHS COVID-19, Protect Scotland, and StopCOVID NI (for Northern Ireland).Footnote 23 Japan also adopted AGEN in its contact tracing scheme called COCOA.
On the other hand, a main differentiating feature of the partially decentralized approach is that, in addition to being equipped with the functions of the fully decentralized app, a partially decentralized app would send ephemeral IDs collected from other smart phones to the server database so that it becomes possible to conduct contact tracing, risk analysis, and message transmission, utilizing the data accumulated at the server database.Footnote 24 Its examples include the Pan-European Privacy-Preserving Proximity Tracing (PEPP-PT) (specifically, the ROBERT protocol) and BlueTrace.Footnote 25 The PEPP-PT scheme was embedded in France’s StopCovid and TousAntiCovid, and the BlueTrace approach was embedded in Singapore’s TraceTogether and Australia’s COVIDSafe.
Unlike these approaches, Korea has taken a centralized network-based contact tracing approach, which utilizes geolocation data collected from mobile carriers and other types of data that facilitate tracking of individuals. This approach does not allow its citizens to opt out of the contact tracing scheme. Only a few other jurisdictions, including IsraelFootnote 26,Footnote 27 and China,Footnote 28 appear to have taken this approach. In Korea, government agencies are granted a broad authority to process personal data during a pandemic for epidemiological purposes. Under the current provisions of the CDPCA, the Korea Disease Control and Prevention Agency (KDCA)Footnote 29 and municipal or/and local governments can, at the outbreak of an infectious disease, collect, profile, and share several categories of data that pertain to individuals who test positive or individuals who are suspected of being infected.Footnote 30 The data that can be collected include geolocation data; personal identification information; medical and prescription records (including the Drug Utilization Review [DUR]); immigration records; card transaction data for credit, debit, and prepaid cards; transit pass records for public transportation; and closed-circuit television (CCTV) footage.Footnote 31 In this context, ‘individuals who are suspected to be infected’ mean those who have been in close proximity to confirmed individuals, those who entered the country from a high risk region, or those who have been exposed to pathogens and other risk elements.Footnote 32 These individuals can be required to quarantine.Footnote 33 The CDPCA explicitly stipulates that the request of geolocation data under this law overrides the otherwise-applicable consent requirements under the LIA and CSPA.Footnote 34
The KDCA can share the foregoing data with (i) central, municipal, or local governments, (ii) national health insurance agencies, and (iii) healthcare professionals and their associations.Footnote 35 The KDCA must also transfer a part of the data, including immigration records, card transaction data, transit pass records, and CCTV footage, to national health insurance information systems and other designated systems.Footnote 36
Despite this legal mandate and authority, however, in practice, the scope and breadth of the data processed for contact tracing purposes and the recipients of the shared data have been much narrower, as explained in Subsections 3 and 4.
3. Legal Basis for QR Code Tracking
The amendment to the CDPCA of 4 March 2020Footnote 37 authorized the KDCA, the Ministry of Health and Welfare, and municipal and/or local governments to issue decrees to citizens ‘to keep the list of administrators, managers, and visitors at the venues or facilities having the risk of spreading infectious diseases’.Footnote 38 This new provision enabled the KDCA to deploy an electronic visitor list system by utilizing QR (quick response) codes.
4. Legal Basis for the Disclosure of the Routes of Confirmed Cases
Under the CDPCA, at the outbreak of a serious infectious disease, the KDCA and municipal and/or local governments must promptly make the following information publicly available on the Internet or through a press release: the path and means of transportation of confirmed cases; the medical institutions that treated the cases; and the status of relevant close contacts.Footnote 39 Anybody can appeal if the disclosed information is incorrect or if there is any opinion. From this appeal, if deemed needed, the KDCA or municipal and/or local governments should immediately take necessary remedial measures such as making a correction.Footnote 40
This provision allowing for public disclosure of information is an important exception to the principles set forth under Korea’s data protection laws. This provision was introduced in the CDPCA in 2015 following the MERS outbreak. At the time, epidemiologists first requested the government to disclose the information about the hospitals that treated confirmed cases and also about the close contacts in order to protect healthcare professionals from the risk of infection.Footnote 41 The public opinion also urged the government to ensure transparency by disclosing whereabouts of confirmed cases.Footnote 42 In response, the government disclosed the list of the hospitals that treated confirmed cases on 5 June 2015, breaking the non-disclosure principle for the first time. A bill for the foregoing provision was submitted on the same day and was passed by the legislature on 6 July 2015.Footnote 43 The bill was passed within a very short period of time and, as such, there was insufficient time to consider and debate privacy concerns and other important implications that would arise from the amendment. Following the outbreak of COVID-19 in 2020, this provision was immediately triggered, raising considerable privacy concerns as explained below in Sub-section V 2.
5. Legal Basis for Quarantine Monitoring
The amendment to the CDPCA of 4 March 2020Footnote 44 authorized the KDCA and municipal and/or local governments to check the citizens for symptoms of infectious diseases and to collect geolocation data through wired or mobile communication devices.Footnote 45 This new provision enabled the KDCA to track GPS data to monitor those quarantined at home.
Prior to this amendment, the quarantine monitoring app had already been in use. During this period, in order to comply with the consent requirements for the collection and use of personal geolocation data under the LIA, the app to be used for monitoring purposes made a request to an installer to click on the consent button before installation process starts. Because installing the monitoring app and providing the requisite consent allowed one to avoid the inconvenience of being manually monitored by the quarantine authorities or of facing the possibility of being denied entry into the country, most individuals who were subject to quarantine appear to have chosen to use the app. It was not entirely clear whether such involuntary agreement to download and activate the app constitutes valid consent under the LIA, and the foregoing amendment to the CDPCA clarified the ambiguity by explicitly allowing the collection of geolocation data for quarantine monitoring purposes.
III. Role of Technology in Korea’s Response to COVID-19
A variety of technological means were employed in the process of coping with the pandemic in Korea. Among these, the most important means would include the tools to gather and utilize geolocation data for the purposes of engaging in contact tracing and other tracking activities. The following describes how technological tools were deployed.
1. Use of Smart City Technology for Contact Tracing
Based on the mandate and authority under the CDPCA, the Korean government launched the COVID-19 Epidemic Investigation Support System (EISS) on 26 March 2020.Footnote 46 By swiftly remodeling the EISS from the existing smart city data hub system developed by several municipal governments, Korea could save time during early days of the pandemic. Prior to the outbreak of COVID-19, in accordance with the Smart City Act,Footnote 47 the Korean central and municipal and/or local governments had been developing and implementing smart city hubs; several ‘smart cities’ have been designated as test beds for innovation in an effort to foster the research and development in areas related to sharing-economy platforms, AI services, Internet-of-Things technologies, renewable energy, and other innovative businesses. In relative terms, compared to a situation in which systems developed for security service agencies are redeveloped and used for contact tracing purposes, the use of a smart city system might have the advantage of heightened transparency and auditability.
The EISS collects requisite data pertaining to confirmed cases and those who are suspected to have been in contact. Data that can be collected includes base station data from mobile carriers and credit card transaction data from credit card companies. In order to obtain data, clearances should be obtained from the police and from the Credit Finance Association (CREFIA), respectively, for base station data and for credit card transaction data. After clearances are obtained, transfer of the data to epidemiological investigators takes place on a near real-time basis.Footnote 48 Equipped with base station data and credit card transaction data, epidemiological investigators can effectively track many of the confirmed cases and their close contacts, as Korea is reported to have the highest penetration rate in the world for mobile phones and for smart phones, respectively at 100% and 95% as of 2019 (Figure 18.2).Footnote 49
In addition to the EISS, epidemiological investigators at municipal or local governments can, upon request, be given access to the DUR by the KDCA. Under ‘normal’ circumstances, a main use of the DUR would be to give useful information about various drugs to the general public and to those engaged in the pharmaceutical supply chain. In the context of COVID-19, the DUR could further be used for obtaining requisite tracing data.
2. Use of QR Codes for Tracking Visitors to High-Risk Premises
On 10 June 2020, shortly after the 2020 amendment to the CDPCA came into force, Korea further launched a QR code-based electronic visitors’ log system to track visitors to certain designated types of high-risk premises such as restaurants, fitness centers, karaoke bars, and nightclubs. This system was deployed with the help of two large Internet platform companies, Naver and Kakao, and of mobile carriers through an app called Pass (Figure 18.3).
With this system in place, for instance, a visitor to a restaurant must get an ephemeral QR code pattern from a website or mobile app provided by the Internet platform companies or mobile carriers, and have the pattern scanned using an infrared dongle device maintained by the restaurant, typically at the entrance.Footnote 50 That way, QR code-based electronic visitor lists are generated and maintained for these premises (KI-Pass). Maintaining this tracking system could, however, raise concerns over privacy or surveillance. In order to address these concerns, identifying information about the visitors is kept separately from the information about individual business premises. More details about this bifurcated system are provided in Sub-section IV 2.
3. Public Disclosure of the Routes of Confirmed Cases
Routes of confirmed cases are disclosed on the websites of the relevant municipal and/or local governments, in a text or tabular form. No enhanced technology is used for the disclosure. The disclosed information is also sent to mobile phones held by nearby residents as an emergency alert message in order to alert them of the possible exposure and risks.
4. Use of GPS Tracking Technology and Geographic Information System (GIS) for Quarantine Monitoring
The CDPCA also grants authorization for quarantine measures to government agencies. Thus, a 14-day quarantine requirement was introduced for (1) individuals who are deemed to have been in close proximity to confirmed casesFootnote 51 and (2) individuals who arrive from certain high-risk foreign countries.Footnote 52 To monitor compliance, those who are under quarantine are required to install and run a mobile app called the ‘Self-Quarantine Safety Protection App’ developed by the Ministry of the Interior and Safety. The app enables officials at competent local governments to track GPS data from smart devices held by those quarantined on a real-time basis, through the GIS, in order to check and confirm whether they have remained in their places of quarantine. Also, quarantined individuals are expected to use the app to report symptoms, if any, twice a day (Figure 18.4).
IV. Flow of Data
In a nutshell, developing and deploying a tracing system is about gathering and analyzing data. While using the collected data for epidemiological purposes could be justified on the basis of public policy reasons, legitimate concerns over surveillance and privacy could be raised at the same time. As such, it is imperative to consider provenance and governance of various types of data. A starting point for doing this would be to analyze the flow of data, to which we now turn.
1. Centralized Contact Tracing
Personal data including geolocation data of an individual flows within the EISS in the following steps: (i) the KDCA or municipal and/or local governments make a request; (ii) the police and/or the CREFIA give clearances to the transfer of mobile base station data and/or credit card transaction data, respectively; (iii) mobile carriers and/or credit card companies provide data as requested; (iv) epidemiological investigators review and analyze data pertaining to confirmed cases; (v) the investigators verify and obtain further information through interviews with confirmed cases; (vi) the investigators further conduct epidemiological network analysis and identify epidemiological links regarding the spread of COVID-19; and (vii) the KDCA and municipal and/or local governments receive relevant data and implement necessary measures such as quarantine or the disinfection or shutdown of premises where confirmed individuals visited.Footnote 53
In the whole process, mobile base station data plays a crucial role for tracing purposes. Mobile base station data contains the names and phone numbers of the individuals who were near a specific base station. Exact location data were not collected, although collecting such data would have been technically feasible through triangulation using latitude or longitude data. However, as mobile base stations are installed at an interval of 50 to 100 meters in a downtown of a densely populated city such as Seoul, base station data can be considered precise enough for the purpose of identifying those who stayed near a confirmed case. At the same time, because the geographic coverage of a base station could be rather broad, there could be an issue of over-inclusion, with implications on privacy.
2. QR Code Tracking
An outbreak in May 2020 was investigated and found to have an epidemiological relationship to a night club located in the Itaewon district, in Seoul. When this outbreak became serious, efforts were made to locate the individuals affected and to conduct interviews so that further preventative measures could be deployed. However, only 41.0% of individuals, (i.e., 2,032 out of 4,961 individuals), could be contacted by epidemiological investigators over the phone.Footnote 54 This was mainly due to the fact that the visitor list was hand-written by the visitors themselves and that sexual minorities who visited the club wrote down false identifies and/or phone numbers for fear of being forced to reveal their sexual orientations. This inability to contact a larger number of visitors to a particular premise reinforced the view that paper visitor lists should be substituted, where possible, with electronic visitor lists, so that the accuracy of the information contained in the visitor lists can be all but guaranteed.
This hastened the development of a QR code-based electronic visitor list system, which was deployed on 10 June 2020.Footnote 55 When a visitor has his or her QR code scanned by an infrared dongle device installed at a business premise, the manager of the business premise does not collect any personal data, other than the code itself. Under the system deployed in Korea, visitor identification information is held by the issuers of QR codes only, unless a need arises to confirm the identity for epidemiological purposes. Specifically, one of the three private entities which issue QR codes holds visitor identification information: Internet platform companies Kakao and Naver and mobile carriers who jointly developed the app named Pass. Data directly related to business premises are held by the Social Security Information Service (SSIS). That is, the SSIS collects the following data: the name of the business premise, time of entry, and encrypted QR codes. The SSIS does not hold any personally identifiable data in this context.Footnote 56 That way, relevant data are kept separately, and a bifurcated system is maintained. When a report is made that a visitor to a business premise is confirmed positive, the bifurcated datasets are then combined on a need basis in order to retrieve the relevant contact information, which is transmitted to the EISS. The transmitted information is then used by the KDCA and municipal and/or local governments for epidemiological investigations. The data generated by QR-code scanning is automatically erased after four weeks.Footnote 57
3. Public Disclosure of the Routes of Confirmed Cases
As explained above, municipal and/or local governments receive geolocation data and card transaction data from the EISS and disclose a part of the data to the general public. At an earlier stage of the COVID-19 outbreak, very detailed routes of confirmed cases were disclosed to the public. These disclosures did not include the names or other personally identifiable information of the confirmed individuals. What was revealed typically included a pseudonym or part of the full name of the infected individual as well as sex and age. In addition, vocation and/or area of residence was often disclosed. Although directly identifiable personal information was not disclosed, sometimes simple investigation and profiling would enable re-identification or reveal personal details. Certain individuals indeed became subject to public ridicule, after their identities were revealed. Debates on privacy followed, and the KDCA revised its guidelines about public disclosure of information on contact tracing. As a result, municipal and/or local governments are now disclosing much more concise information focusing on locations and premises rather than on an individuals’ itinerary. Also, disclosure information is deleted after fourteen days following disclosure. One of the examples is as shown in Table 18.1.
□ Status of Case No. **** - Source of Infection: Presumably infected from a family member - Confirmed positive on 11 January. □ Status of Case No. **** - Source of Infection: Presumably infected from a family member - Confirmed positive on 11 January. □ Status of Case No. **** - Source of Infection: Presumably infected from a confirmed case at the same company in a different region - Confirmed positive on 11 January. □ Status of Case No. **** - Source of Infection: Under investigation - Confirmed positive on 11 January. □ Status of Case No. **** - Source of Infection: Under investigation - Confirmed positive on 11 January. □ Status of Case No. **** - Source of Infection: Under investigation - Confirmed positive on 11 January. ※ Measures - Will transfer confirmed cases to the government-designated hospitals - Will disinfect the residence and neighboring areas of confirmed cases - Investigating visited places and close contacts |
4. Quarantine Monitoring
The self-quarantine app collects GPS data from mobile devices and shares it with the GIS, so that an official at the local government can monitor the location of a quarantined individual on a real-time basis.
V. Data Flow and Data Governance
Collecting relevant data on a near real-time basis is crucial in order to contain the spread of COVID-19. At the same time, data collection immediately raises privacy concerns. As such, a delicate balance must be struck between conducting effective epidemiological investigations and protecting the privacy of individuals. Delineating the precise flow and provenance of collected data will give implications as to how the delicate balance can be struck and maintained. The following section gives some explanations as to what transpired in Korea in this respect.
1. Centralized Contact Tracing (including QR Code Tracking)
An early response is critical to contain the spread of highly infectious diseases such as COVID-19. In turn, the effectiveness of such a response relies on the prompt collection and sharing of accurate data about confirmed cases and close contacts. Manual epidemiological tracing has serious limitations. It takes time for human investigators to conduct manual tracing, causing delays. Also, manual tracing is vulnerable to faulty memory or deception on the part of interviewees, resulting in inaccurate epidemiological reports.
In response to the rapid spread of COVID-19, Korea chose to integrate such human efforts with a technology-driven system of data processing. For example, a prompt compilation of geolocation data has been a crucial enabling factor in Korea’s contact tracing strategy. The EISS, which makes use of the smart city technology, allowed public health authorities to efficiently allocate valuable resources. With the assistance of technology, for instance, epidemiological investigators were able to conduct tracing in a more effective and efficient manner.
At the same time, questions were raised whether the centralized contact tracing model adopted in Korea was overly intrusive, even harmful to fundamental freedoms constituting the very cornerstones of a democratic society. The collection of data has sometimes been equated to mass surveillance, raising privacy concerns as well. This line of criticism would have a clear merit, if certain other alternative tracing systems show the same or even higher level of efficacy, while collecting less granular and less detailed personal data.
The problem, however, is that, while a decentralized system such as the Bluetooth-based approach is in general better in protecting privacy, it has its own shortcomings that are yet to be solved. First, a tracing app needs to attain a certain penetration rate, in other words, the proportion of active users of the mobile app among the whole population should be sufficiently high for a tracing system to function properly. In order to achieve the so-called digital herd immunity this penetration rate should be fairly high – sometimes set at 60 to 75%.Footnote 59 To date, most countries have failed to achieve this level of penetration rate, due to, among other things, low levels of smartphone penetration rates. Second, Bluetooth-based proximity tracing may not work effectively in crowded areas that are in fact prone to experience explosive outbreaks of infectious diseases such as COVID-19. Third, decentralized models generally do not allow for human-in-the-loop based verification and tend to show excessively high false positives.Footnote 60 Fourth, iOS does not allow third-party apps running in the background to function properly in order to broadcast Bluetooth signals, unless the AGEN system is deployed.Footnote 61 Fifth, for a fully decentralized approach, there would be no informational benefits to public health authorities because relevant information simply does not flow to public health authorities. While this could be beneficial in maintaining the privacy interests of citizens, at the same time, precious opportunities for gaining epidemiological data would be lost. Lastly and perhaps most fundamentally, the decentralized approach has to rely on good-faith cooperation by confirmed individuals. That is, the approach would not work unless confirmed individuals make voluntary reports and, as such, this approach exhibits a similar problem as in a manual tracing method.
This is not to say that a centralized model would always be preferable. While a decentralized approach may not lead to a herd immunity, it could nonetheless play a complementary role in containing the spread of COVID-19, particularly in densely populated areas such as city centers and on university campuses. Thus, as a general matter, a centralized approach and a decentralized approach each have their own strengths and limitations. For a centralized approach, its main strengths would include: immediate availability undeterred by the penetration levels; effective response to mass infection; no compatibility concerns; and most importantly, impactful contribution to epidemiological investigations.
In the case of Korea, there is no denying that contact tracing and other tracking mechanisms were a crucial component in the whole apparatus dealing with daunting challenges caused by COVID-19. Also, overall, Korean society at large complied with the requirements imposed by these tracking mechanisms, without raising serious privacy concerns. If we go one step further, there could be various viewpoints and reactions as to why Korean citizens in general complied with the measures adopted by the government.
In terms of data protection, the PIPA was enacted in 2011 and earlier statutes also contained various elements of data protection. Separately, the Constitutional Court of Korea, in 2005, declared that the right to data protection is a constitutional right. As such, Korean citizens are in general well aware of the value of data protection in modern society. In adopting technology-based contact tracing mechanisms and complying with the requirements associated with such contact tracing mechanisms, it can be said that the Korean society as a whole made a wide-ranging value judgement about privacy, public health, and other social and legal values. Among other things, citizens exhibited a striking willingness to cooperate with authorities in their efforts to collect epidemiological data including geolocation data, which can be traced back to their previous experience with the MERS outbreak. Utilizing new technologies for epidemiological purposes was perhaps not much of an additional concern as, in relative terms, many Koreans are at ease with adapting to new technological environments.
This does not mean, however, that the data collection was without controversy. On the contrary, several activist groups joined forces and filed a constitutional petition seeking the Constitutional Court of Korea’s decision regarding the constitutionality of contact tracing mechanisms.Footnote 62 More specifically, the petition challenges the constitutionality of the CDPCA provisions which enabled contact tracing in the first place.Footnote 63 It also views the government’s collection of mobile base station data based on these provisions unconstitutional, in particular pointing to the collection of data about the visitors at a night club in the Itaewon area during an outbreak, as doing so violates, among other things, the constitutional right to self-determination of personal data.Footnote 64 This petition grounds itself on the Court’s 2018 decision that held the collection of the identities of mobile subscribers that accessed a particular base station in the course of criminal investigation unconstitutional.Footnote 65 Regardless of the outcome of this case, the scope of geolocation data might need to be adjusted to balance epidemiological benefits with privacy.
In the case of QR codes, the bifurcated approach perhaps helps mitigate security risks and privacy concerns, by separating personally identifiable data from visitor logs and by combining them only when necessary for epidemiological investigation. Also, while both a paper form for visitor logs and a QR-code based electronic log system are usually available at business premises, the general public appears to prefer the QR code-based electronic visitor log system. Part of the reason would be the trustworthiness of the QR-code system. That is, while there is virtually no concern over possibilities of data breach for a QR code-based electronic visitor log system, a paper visitor list could be vulnerable to illegal leakage by employees of business premises or by subsequent visitors.Footnote 66
About the contact tracing mechanism in general, there could be a concern over possibilities of ‘function creep.’ The concern is that, while conducting contact tracing under the current extraordinary circumstances of COVID-19 could be justified, after the pandemic is over, the government may be tempted to use this mechanism for surveillance purposes. In the case of Korea, there are two built-in safeguards against this from happening. First, data collection for epidemiological purposes is under the sole purview of the KDCA and the relevant databases are maintained by the KDCA as well. This means that, even if the government is tempted to divert the system for different purposes, doing so would be a cumbersome procedure simply because the system is maintained and held by a single public health agency with a narrow public health mandate. Second, the KDCA’s authority for the current data collection is, for the most part, derived from statutory provisions contained in the CDPCA and not from the PIPA, a general data protection statute. After the pandemic is over, the KDCA or any other government agencies would require a separate statutory rationale in order to collect data.
Compared to the Korean government’s active role in utilizing technology to cope with the COVID-19 pandemic, public-private collaboration based on the sharing of public data and the use of open APIs (application programming interfaces) in Korea has somewhat lagged. There have been recent cases of meaningful contributions from the private sector, however. An example would be a collaborative dataset sourced from public disclosures, which has been actively used for visualization purposes and also for machine learning training purposes.Footnote 67
2. Public Disclosure of the Routes of Confirmed Cases
Unlike contact tracing itself, which was generally accepted as a necessary trade-off between privacy and public health in facing the pandemic, the public disclosure of the routes of confirmed cases quickly became controversial due to privacy concerns. Such public disclosures were, in fact, another policy response from the experiences of the MERS outbreak. That is, during the MERS outbreak, there was great demand for transparency and some argued that the lack of transparency impeded an effective response. However, with the onset of the COVID-19 outbreak, the pendulum swung in the other direction. Not just the detailed nature but also the uneven scope and granularity of disclosures among the KCDA and the numerous municipal and local authorities caused confusion, in particular during the initial phase. Concerns were not limited to the invasion of privacy. Private businesses, such as restaurants and shops, that were identified as part of the routes often experienced abrupt loss of business.
These concerns were encapsulated in the recommendation issued by the National Human Rights Commission (NHRC) on 9 March 2020.Footnote 68 The NHRC expressed concerns about unwanted and excessive privacy invasion as well as secondary damages such as public disdain or stigma, citing a recent survey showing that the public was even more fearful of the privacy invasion and stigma stemming from an infection than the associated health risk itself.Footnote 69 The NHRC noted that excessive public disclosure could also undermine public health efforts by dissuading those suspected of infection from voluntarily reporting their circumstances and/or getting tested for fear of privacy intrusions.Footnote 70 The NHRC further recommended that route disclosures be made in an aggregate manner focusing on locales at issue, rather than disclosing the times and places of visits at an individual level and possibly revealing personal itineraries.Footnote 71
In response to the NHRC’s recommendations, the KDCA issued its first guidelines regarding public disclosures to municipal and local governments on 14 March 2020, which limited the scope and detail of the information to be made publicly available. Specifically, the KDCA (i) limited the period of route disclosure from one day prior to the first occurrence of symptoms to the date of isolation, (ii) limited the scope of visited places and means of transportation to those spatially and temporally proximate enough to raise concerns of contagion, considering symptoms, duration of a visit, status of contacts, timing, and whether facial masks were worn, and (iii) banned the disclosure of home addresses and names of workplaces. On 12 April 2020, the KDCA further revised the guidelines. Under the revised guidelines, (i) information on routes should be taken down 14 days after the confirmed case’s last contact with another individual, (ii) information on ‘completion of disinfection’ should be disclosed for relevant places along the disclosed routes, and (iii) the period of route disclosure should start from two days prior to the first occurrence of symptoms.Footnote 72 One complication from public disclosures of information is that, once a disclosure is made, the disclosed information is rapidly further disseminated via various social media outlets by individual users. Thus, data protection agencies have been actively sending out takedown notices to online service providers to ensure that such content is taken down following the 14-day period.
In May 2020, a spate of confirmed cases arose at a nightlife district in Itaewon, Seoul, that is frequented by persons with a specific sexual orientation. While public health authorities mounted a campaign urging prompt testing for those who could be at risk, it was ostensible that the fear of being forced to reveal sexual orientations or being socially ostracized was a significant deterring factor. In response, the Seoul Metropolitan government initiated anonymous testing from 11 May 2020, under which individuals were only asked for their phone numbers. The anonymous testing scheme expanded and began to be applied to the whole country on 13 May 2020.
After witnessing these debates, the KDCA issued further revised guidelines dated 30 June 2020. The latest guidelines provided that municipal and/or local governments should disclose the area, the type of premises visited, the trade names and addresses of these premises, the date and time of exposure, and disinfection status and that disclosures should not be made for each individual and his or her timeline but instead in the format of ‘lists of locations visited.’ The guidelines further stipulated not to disclose information regarding the visited places if all close contacts have been identified.Footnote 73
Subsequently, an amendment to the CDPCA was made dated 29 September 2020 and this amendment, among others, included a provision that excludes from the scope of public disclosure the ‘sex, age, and other information unrelated to the prevention of contagious disease as stipulated in the Presidential Decree.’Footnote 74 The current Presidential Decree for the CDPCA lists the name and detailed address as examples of such ‘other information unrelated to the prevention of contagious disease.’Footnote 75
The above shows the ongoing process of trial and error in search of a more refined approach which would better balance the imperatives emanating from public health concerns during a pandemic with privacy and other social values. Urgency of the situation perhaps made it imperative to implement swift measures for gathering information. While implementing swift measures is inevitable, it is also important to review the legitimacy and efficacy of these measures on an ongoing basis and to revise if needed. For instance, compared to the disclosure of precise routes profiled for each confirmed case, the disclosure of aggregated route information has proven sufficient to achieve the intended public health policy goals. As demonstrated in the Itaewon Case, a less privacy-intrusive alternative can also assist infection control efforts by encouraging voluntary reporting and testing.
Regarding the disclosure of the names and addresses of business premises, assuming that disinfection can effectively address contagion risks, the only benefit would be to alert other visitors and to encourage them to self-report and get tested. Therefore, if all visitors are in fact identifiable through contact tracing, the public disclosure of the type of business and the broader area of the location, rather than identifying the name of the specific business premise, would be sufficient for purposes of public health. In fact, revisions to the KDCA guidelines were made reflecting practical lessons learned throughout 2020 and provide for deletion of data that is unnecessary or no longer necessary.
3. Quarantine Monitoring
Human surveillance of quarantined persons is often costly, ineffective, and in many cases inevitably intrusive. The quarantine monitoring through GPS tracking has generally been regarded as a more effective but less intrusive substitute for the human surveillance. As such, there have not been serious privacy concerns raised about quarantine monitoring.
4. Data Governance
On a regulatory front, the outbreak of COVID-19 has highlighted the need for Korea’s privacy and data protection authorities to be ever more vigilant during public emergencies. In February 2020, Korea undertook a major reform to its privacy and data protection laws which came into effect as of 5 August 2020. As a result of the amendments, Korea’s data protection authority will be consolidated and vested in the Personal Information Protection Commission (PIPC). This reform is expected to allow the PIPC to engage in a more proactive role in balancing the rights of data subjects with public health goals and to provide clearer guidance as to what to disclose and how to de-identify when making public disclosure.
On a broader level, in terms of the flow and provenance of data, two general directions can be distinguished. One direction is from the general population to public health authorities. Data gathered and shared in this direction is mainly done in order to carry out contact tracing, to conduct epidemiological analyses, and to devise and implement public health measures. At the same time, data flows toward the other direction as well, from the government and public health authorities to the general public. What is carried out in this context is mostly public disclosures of data about confirmed cases. Doing this would presumably be helpful for purposes of enhancing transparency and giving alerts so that citizens can prepare.
Regarding both directions of data flows, there are tensions between public health purposes and privacy interests: gathering and disseminating detailed information would in general be helpful in containing the spread of COVID-19, while, at the same time, doing so could be detrimental to the protection of the privacy of citizens. Details of the tensions, however, are different between the two directions of data flows. When data flows from the general public to public health authorities, a major concern would be the possibility of surveillance. Seen from a public policy perspective, attention would thus need to be paid as to whether and how a possible concern over surveillance could be assuaged. Putting in place systematic and procedural safeguards could be helpful. On the other hand, when data flows from public health authorities to the general public, mostly in the form of public disclosures of data about confirmed cases, concerns could be raised about the privacy of citizens. A privacy concern in this context could arise due to the possibility of the revelation of unwanted or embarrassing personal details. The risk could be elevated, if there is an added motivation for a public officer to gain attention through media, by leaking a ‘headline grabbing’ news item. In that regard, attention may need to be paid as to what data is made available to public sector officers.
VI. Looking Ahead
As the COVID-19 outbreak continues its course, new societal challenges or existing ones that are being exacerbated by the pandemic such as the digital divide, are gathering more attention in Korea and elsewhere. Heightened concerns of ostracization or stigma directed to minority groups, the vulnerability of health and other essential workers that face constant exposure to infections, and children from underprivileged families that are ill equipped for remote learning are but a few examples. The Itaewon case, discussed earlier, has demonstrated the need for authorities to be prepared to promptly address concerns of prejudice against minority groups in the Korean society. The same should be said regarding the acute health and economic disadvantages faced by the underprivileged during a pandemic. Yet, the societal challenges in the post-COVID-19 era, with its trend towards remote work, education, and economic activity will likely call for more long-term and fundamental solutions.
In this regard, the active use and application of AI and data analytics, as well as a robust ethical review concerning its governance, is expected to be critical in achieving the social reforms required to cope with the challenges of the present and coming future. In doing so, a pre-requisite would be to compile and draw a ‘data map’ so that data’s flow and provenance can systematically be understood. With such understanding, further discussions could perhaps be made regarding appropriate levels of granularity for data disclosures and different levels of access control and other safeguards, depending on specific needs or policy goals. Korea’s experience dealing with COVID-19 can provide a valuable lesson in this context.