Privacy in Data Science: The art of balancing innovation and risk

Saturday, January 28, 2023 - 16:11

The issue of information protection and privacy has been increasingly discussed seriously in order to ensure that any individual's personal data is used responsibly and ethically, without compromising security and confidentiality issues.

Progressively, organizations and institutions have adopted strict information protection mechanisms, with very well-established rules, which intend to prevent (or minimize) the misuse of personal information in generalized contexts. On the other hand, in the most diverse environments, the growing technological advances also drive a marked increase in competitiveness, motivating organizations to invest heavily in the development of intelligent decision-making support systems that allow them to obtain competitive advantages at the expense of analysis descriptive, predictive and prescriptive techniques driven by a large volume of data. If, on the one hand, the adoption of well-defined data protection policies allows a very wide range of advantages for individuals, on the other hand, it also entails some limitations in the development of systems capable of efficiently generalizing from historical data. . Typically, knowledge extraction systems from data require the collection and treatment of important variables in the context of the application in question. In this context, it is quite frequent that the correct modeling of a certain phenomenon of interest requires the use of sensitive information for the user but of paramount importance for a statistical or automatic learning model.

In light of this reality, one of the fundamental challenges in the area of data science arises: that of promoting the use/management of information that, on the one hand, does not compromise any individual and protects him from violations of his privacy, guaranteed, on the other, the collection, storage and processing of data necessary for the development of accurate models that promote significant improvements in decision-making processes. Although it seems natural that the protection of individual interests should prevail in this equation, there are more and more events of non-compliance with legal and ethical issues motivated by technological development to the detriment of information protection and security. In the context of Health, for example, the use and manipulation of variables that characterize different individuals can facilitate the development of models for extracting knowledge from data capable of capturing existing patterns in a given population. However, the ethical and social implications resulting from this development must prevail over the interests of using sensitive but beneficial information from a statistical and innovative point of view.

In practice, it becomes increasingly important to foster a critical and ethically responsible spirit that allows technological and scientific advancement in favor of the common good to coexist with the fundamental domains of transparency, protection of individual rights and the privacy of individuals inserted in any society.

joao nuno

João Nuno Costa Gonçalves

PhD in Industrial and Systems Engineering, Master in Systems Engineering and Degree in Mathematics from the University of Minho.
Full-time Invited Assistant Professor at the Portuguese Catholic University (UCP-Braga) for the scientific area of Data Science and member of the coordinating committee of the Degree in Applied Data Science at UCP.