Insights
The Main Issues in Data Ethics
By
Fred C. Veldhuis
Jul 22, 2022
The accessibility and transformational capabilities of data brings great benefits. Yet, on the flip side it has also brought new risks with regard to compliancy. Data-ethics provides important guidelines to using data the dafe way.
The importance of data and technology has significantly increased over the last years. As conveyed by McKinsey Global Institute, the “global volume of data doubles” almost every three years due to the increase in digital platforms across the world (The age of analytics: Competing in a data-driven world, 2016).
With the availability of this huge amount of data, the danger of using it inappropriately has become an increasing risk. The importance of following ethical methods while performing data analyses is of significant importance. Ethics in this case represent the moral principles that govern an activity conducted (BBC - Ethics - Introduction to ethics: Ethics: a general introduction, 2014).
In data science, it is essential that the data collected for analyses is ethical and appropriately used in building models. A predictive algorithm is one of the most widely used algorithms for decision-making in almost every field where data-science is used. The usage of algorithms often leads to ethical issues. Some of these include privacy, fairness in using the data in a respectable way, producing a shared benefit, governance of data accuracy and transparency (Arvanitakis, 2018).
One of the largest ethical issues in the field of data science is privacy and security (Chulu, 2018). Privacy is the privilege to be able to control the collection and usage of personal information. Collection of immense amounts of data is happening in almost all companies and fields. However, this potential usage of ‘big data’ will often leave a user’s privacy at danger. As we gather more information, it becomes more complicated to protect the privacy of this information. Athough, it is vital that user privacy is always protected by the organizations that have access to them, there are still cases where data privacy is violated at the cost of an organizations target. This has caused a major increase in public concern on the privacy and protection of their data when used for predictive algorithms.
A recent example which shows how data privacy and regulations were breached is a case at the Dutch Tax Administration. The Dutch Tax Administration was fined €2.75 million by The Dutch Data Protection Authority (DPA). The fine was imposed because for many years the Tax Administration processed data on the (dual) nationality of childcare benefit applicants in an unlawful, discriminatory and therefore improper manner. This constituted serious violations of the General Data Protection Regulation (GDPR), the law governing privacy (Autoriteit Persoonsgegevens, 2021).
In it’s press release the Dutch Data Protection Authority wrote:
“The Tax Administration should have deleted the data on dual nationality of Dutch nationals back in January 2014. Dual nationality of Dutch nationals should not play a role in the assessment of childcare benefit applications. Nonetheless, the Tax Administration retained and used this data. In May 2018, some 1.4 million people were still registered as dual nationals in its systems. The Tax Administration also processed the nationality data of childcare benefit applicants for the purpose of combating organized fraud, even though this data was not necessary for this purpose. Lastly, it used applicants’ nationality (Dutch/not Dutch) as an indicator in a system that automatically designated certain applications as risky. The data was not necessary for this purpose either. It is unlawful, and therefore prohibited, to use nationality data to assess applications, combat fraud and determine risk.”
In a world in which digitalization is rapidly advancing, it is becoming all the more crucial to protect an individuals’ personal data in order to protect other fundamental rights, such as the right for safety, property and health. This case is a perfect example of how unlawful processing by means of an algorithm can lead to a violation of the right to equality and non-discrimination. Digital applications have become indispensable, and they enable us to swiftly process and conveniently combine huge volumes of information. But when it goes wrong, it really goes wrong.
A fundamental reason for biased predictive algorithms is social bias growing in the current world (Chulu, 2018). Machine learning is the most widely used method to make predictions on big datasets. In order to arrive at an efficient model, this algorithm must be trained and tested with massive amounts of data. The biases and unfairnesses in society will be reflected in the data which the algorthm uses to train, in turn the model produced by the algorithm inherits the same bias. The lack of diversity in the training data used for the algorithm is an underlying cause behind biased algorithms. Upon increasing the diversity in technology and providing a greater transparency in the algorithm, it has been made possible to decrease he amount of bias in the algorithm.
One way to address these ethical concerns regarding privacy is by concealing the personal identities from companies that do data analysis (Stewart, 2020). An alternative approach to privacy protection is to use a pseudonymized dataset where artificial identifiers can be applied in algorithms.
Whilst data science is now being used almost everywhere in the development of company, it is important that ethics are maintained at all levels possible. One way to address these ethical concerns regarding privacy is by concealing the personal identities used in data analyses (Stewart, 2020). An alternative approach to privacy protection is to use a pseudonymized dataset where artificial identifiers can be applied in algorithms. As the world is growing big in data, the responsibility to maintain privacy and fairness in employing this data is also mounting. There are multiple ways to prevent ethical and privacy concerns. Furthermore, introducing diversity in tech sectors and demanding for transparency of data helps in the reduction of algorithmic bias.
The age of analytics: Competing in a data-driven world. McKinsey&Company. (2016). Retrieved 28 September 2020, from https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-age-of-analytics-competing-in-a-data-driven-world.
BBC - Ethics - Introduction to ethics: Ethics: a general introduction. Bbc.co.uk. (2014). Retrieved 1 October 2020, from http://www.bbc.co.uk/ethics/introduction/intro_1.shtml#:~:text=Ethics%20is%20concerned%20with%20what,%2C%20habit%2C%20character%20or%20disposition.
Arvanitakis, J. (2018). What are tech companies doing about ethical use of da-ta? Not much. The Conversation. Retrieved 1 October 2020, from https://theconversation.com/what-are-tech-companies-doing-about-ethical-use-of-data-not-much-104845.
Chulu, H. (2018). Let us end algorithmic discrimination. Medium. Retrieved 28 September 2020, from https://medium.com/techfestival-2018/let-us-end-algorithmic-discrimination-98421b1334a3.
Stewart, M. (2020). Data Privacy in the Age of Big Data. Medium. Retrieved 2 October 2020, from https://towardsdatascience.com/data-privacy-in-the-age-of-big-data-c28405e15508.
Nicklin, A. (2018). Applying Ethics to Algorithms. Medium. Retrieved 27 September 2020, from https://towardsdatascience.com/applying-ethics-to-algorithms-3703b0f9dcf4.
Jones, H. (2018). AI, Transparency and its Tug of War with Privacy. Medium. Retrieved 30 September 2020, from https://towardsdatascience.com/ai-transparency-and-its-tug-of-war-with-privacy-5b94c1d262ad.
Heilweil, R. (2020). Why algorithms can be racist and sexist. Vox. Retrieved 9 October 2020, from https://www.vox.com/recode/2020/2/18/21121286/algorithms-bias-discrimination-facial-recognition-transparency.
Schlenker, L. (2019). The Ethics of Data Science*. Medium. Retrieved 4 October 2020, from https://towardsdatascience.com/the-ethics-of-data-science-e3b1828affa2.
Dutch Data Protection Authority (2021). Press Release of Tax Administration fined for discriminatory and unlawful data processing, retrieved 8 December 2021 from https://autoriteitpersoonsgegevens.nl/en/news/tax-administration-fined-discriminatory-and-unlawful-data-processing
(Source: https://openai.com/blog/chatgpt/)