On Bias Detection: A critical analysis of Article 10(5) of the EU Ai Act

Keketso Kgomosotho

Abstract

This short research paper assesses the effectiveness of the EU Artificial Intelligence Act (“AI Act” or “the Act”) in addressing algorithmic discrimination, focusing on Article 10(5)’s exception to the GDPR’s prohibition on processing special categories of personal data. The analysis reveals that while the Act is a significant step towards AI regulation, this exception, intended to facilitate bias detection and correction in high-risk AI systems, poses substantial risks to privacy, even with the prescribed safeguards. The Act’s reliance on existing legal frameworks and its current safeguards may prove insufficient due to the complex and evolving nature of AI-driven discrimination. The study concludes by emphasising the need for a more comprehensive and adaptive legal approach that can effectively mitigate the risks of algorithmic discrimination while upholding fundamental rights.

Keywords: Artificial Intelligence Act; Special categories of personal data; GDPR, EU Ai Act, Algorithmic Discrimination; Bias in AI; prohibited grounds, Article 10(5).

A.   Introduction

During the first half of 2024, the European Union (“EU”) adopted its landmark Artificial Intelligence Act (“AI Act” or “the Act”), becoming the first regional legal system to establish a legal framework for the governance of AI.[1] The Act aims to provide a comprehensive legal governance framework for AI systems in the Union to support innovation and promote the uptake of human-centric and trustworthy AI while also ensuring a high level of protection of health, safety, and fundamental rights against the harmful effects of AI systems in the Union.[2] To that end, the Act identifies AI bias and discrimination as a significant risk to fundamental rights in the context of AI systems and attends it through a multifaceted approach comprising multiple, mutually reinforcing measures throughout the Act–aimed at minimising the risk of bias and discrimination, either directly or indirectly. The preamble of the Act consistently emphasises the principle of non-discrimination and acknowledges the associated risks in the context of AI.

This contribution assesses the Act’s effectiveness in addressing the complex and evolving challenges of algorithmic discrimination, focusing on the exception to GDPR left open at Article 10(5) of the Act, to permit the processing of special categories of personal data by Providers of AI systems. Specifically, I will consider the content and implications of the new exception to Article 9 of the GDPR and provide an analytical assessment of its legal implications and effectiveness in responding to the challenge of algorithmic discrimination. Before we begin with Article 10(5), however, we must first begin by understanding the nature of the challenge to which the Act responds with this exception.

B.    Algorithmic Discrimination

Algorithmic discrimination refers to discriminatory outcomes (or disparate impacts if you’re in the US) produced by data processing and analytics in data-driven and algorithmic systems against certain individuals or groups defined by socially sensetive, and therefore legally protected, characteristics or grounds.[3] In simpler terms, it is discrimination produced by algorithms. The AI Act does not define algorithmic or AI discrimination; however, from a reading of the Act, the concern is clearly discriminatory outcomes against groups or individuals defined by prohibited grounds under relevant Union non-discrinination law, resulting from system biases and arising during the use of AI systems in decision-making contexts.[4] I will refer to this simply as “algorithmic discriminaiton.”[5] Algorithmic discrimination manifests primarily in the form of indirect discrimination.[6] – understood legally as the result of a neutral rule, criteria or practice that affects a group or individual defined by a ‘protected ground’ in a significantly more negative way in comparison to others in a similar situation.[7] This can occur in all contexts where data-analytical or algorithmic decision-making systems are applied. In fact, there is overwhelming reported evidence of discriminatory outcomes produced by algorithms in contexts where they are used to make or inform decisions about humans.[8]

For example, in the United States and several EU member States, predictive algorithms are employed in the criminal justice system to support law enforcement and judicial officers in policing and decision making. One of the more popular yet incisive examples is COMPAS, which is used in several jurisdictions, including the United States, to support judicial decisions. It processes historical data from local law enforcement records and makes predictions regarding the likelihood of recidivism at the pre-trial stage or to support sentencing decisions.[9] Indeed an investigation by ProPublica showed that Black and Latino defendants are more likely to be incorrectly categorised as high-risk by COMPAS relative to white defendants.[10]

Other examples of algorithmic bias include algorithmic bias in healthcare,[11] where AI diagnostic support systems have been found to return lower accuracy results for black and female patients than white and white patients.[12]LinkedIn,[13] Google,[14] and Amazon’s[15] AI powerered job advertising systems were found to display high-paying positions to males more often than to females or women-identifying persons.[16] Furher, a number of different investigations have concluded that several of popular large language and generative AI models like Midjourney’s,[17]OpenAI’s, GoogleGemini’s and Meta’s models respectively produce biased results against members of protected groups, like women and black people, etc. Racial discrimination has also been recorded in AI systems in predictive polecing, face recognition technology,[18] and in the AI systems used for the distribution of government social services.[19] There is a consensus that AI systems are biased and discriminatory,[20] and an emerging consensus that the data, which is invariably laced with social human biases is largly attributable for system biases.[21]

The Act identifies 3 sources of the risk of discrimination in AI systems: bias in the data used to train the AI system;[22] the design of the algorithm itself;[23] and the way the AI system is deployed and used, including the context.[24]The challenge is that this bias and discrimination is often difficult if not impossible to detect due to the internal operational complexity and, in part, the (GDPR) legal restrictions placed on the processing personal data. At Article 10(5) the Act responds by leaving open an exception for Provider of AI systems to process special categories of personal data, under specific conditions and safeguards. According to Article 9(1) of the GDPR, special categories of personal data are those that reveal “revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation.”[25] However, as we explore below, this may not be as easy a fix.

C.   The “Proxy Problem” and its challenge to prohibited grounds & special categories

As part of its strategy of achieving consistency with, and avoiding duplication within the EU regulatory landscape,[26]the Act invokes pre-existing EU frameworks that prohibit non-discrimination. Those prohibitions, under existing law, include Article 14 of the European Convention of Human Rights,[27] Article 21 of the Charter of Fundamental Rights of the European Union,[28] Article 2 of Directive 2000/43/EC on racial or ethnic origin;[29] Article 1 of Directive 2000/78/EC on equal treatment in employment and occupation;[30] Article 4 to Directive 2006/54/EC;[31] Article 4 to Directive 2004/113/EC;[32] and Article 1 and 2 of Directive Proposal (COM(2008)462).[33]

Each of these prohibitions set out a framework for the prohibition of discrimination on the basis of selected prohibited grounds, to establish a uniform minimum level of protection within the State, EU and European continent respective. This legal approach of prohibiting discrimination on the basis of selected prohibited grounds discrimination is consistent throughout different legal instruments and systems, with only the specific protected prohibited grounds varying depending on the legal instrument. These grounds often include sex, race, colour, ethnic or social origin, genetic features, language, religion or belief, political or any other opinion, membership of a national minority, property, birth, disability, age or sexual orientation.[34] I call this legal approach to non-discrimination the “prohibited grounds approach.” The approach is deeply ingrained in the constitutive structure of the international legal order; deriving from the foundational principles of equality and non-discrimination, both of which sit at the very heart of the modern global legal order.[35] While historically successful in the context of human decision makers, research demonstrates that this legal approach to discrimination, the “prohibited grounds approach,” disintegrates, proving effective in the face of algorithmic discrimination.

In practice, the approach requires that a presumably human decision-maker refrain from making a decision based on any of the prohibited grounds. In the context of an algorithmic or AI decision systems, the legal norm requires that access to certain data points on certain sensitive characteristics (such as a person’s sexual orientation, sex and gender) is restricted or denied to the algorithm. The problem, however, is that this goes against the very nature and purpose for which algorithms exist. The nature of predictive AI algorithms is to find connections between input data and target variables, regardless of those connections’ normative or legal character.[36] To the algorithms, it matters only that there is a predictive connection, for example between gender and future income, the potential explanations for the relationship between these connections (e.g., patriarchy, hetero-sexism, gender pay gap or history of exclusion) doesn’t matter at all.[37]

These sensitive grounds and personal characteristics, including as protected at Article 9 of the GDPR have a highly predictive or probative value in predicting humans’ future behaviour, because past and ongoing human discrimination has created unequal starting points for different groups, and different social structures have cemented this unequal status quo. Therefore, restricting access to highly predictive data points makes algorithms statistically less accurate in predicting future human behaviour or outcomes.[38] To compensate for this and maintain statistical accuracy, algorithms rely instead on a “proxy” or an indirect indicator of that prohibited ground or restricted sensitive data – thereby promoting the very outcomes the law seeks to avoid. In the discourse, this is called the “Proxy problem.”[39]For example, instead of making a prediction based on “gender” due to legal restrictions in the form of “prohibited grounds approach,” the algorithms will instead rely on a proxy or indirect indicator of gender like the applicant’s purchasing patterns, membership of a particular group, their name, communication style, or occupation from which to infer their gender. Thus, algorithms will always bypass a legal restriction where the data restricted is predictive of future outcomes.

AI has no capacity to know ethics, principles or norms in the same way conscious human minds do. As notes by Deck et al., “legal concepts relying on flexible ex-post standards and human intuition are in tension with the mathematical need for precision and ex-ante standardization.”[40] Unlike human decision makers, AI does not depart from any theory or hypothesis about what types of characteristics may prove useful for predicting the target variable. Rather, it uses “brute force” to learn from scratch which attributes or behaviours predict the outcome of interest.[41] As a data-driven system, AI can understand hard, individual technical parameters, not principles and norms, such as equality and non-discrimination.[42] In a bid to enable Providers of high risk AI systems to detect discriminatory outcomes and mitigate their effects, the Act leaves open an exception at Article 10(5) as a strategy to respond to this dynamic and evolving challenge. I turn to this in the remaining sections.

D.    Article 10(5) – A new exception to the GDPR

Article 10(5) of the Act leaves open a marked exception to Article 9(1) the GDPR, which prohibits the processing of special categories of personal data. It provides that

To the extent that it is strictly necessary for the purpose of ensuring bias detection and correction in relation to the high-risk AI systems in accordance with paragraph (2), points (f) and (g) of this Article, the providers of such systems may exceptionally process special categories of personal data, subject to appropriate safeguards for the fundamental rights and freedoms of natural persons.”

As noted earlier, the special categories of personal data are those that reveal “revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation.”[43] The implications of such an exception will be far reaching. In essence, it provides a limited derogation for Providers of high-risk AI systems from the GDPR’s stringent prohibitions on processing special categories of personal data, based on the principle of data minimisation. In practice, Providers of these systems can, under specific conditions and safeguards, process sensitive data on prohibited grounds such as race, ethnicity, or health status, solely for the purpose of identifying and mitigating biases within their AI systems. Guadino calls this new exception a “paradigm shift” in the governance of special categories of personal data, remarking optimistically that “[the] paradigm shift is an expression of a fundamental optimism that our social reality can be improved in a sustainable manner through properly regulated AI.”[44]

E.    Logic behind the exception

Primarily, the AI Act introduces this exception because system biases are not, in certain circumstances, detectable using only non-sensitive data.[45] Examining special category data can help reveal these biases, by allowing the limited and controlled processing of special category data.[46] In theory the AI Act aims to enable developers to uncover and address these hidden biases, making them easier to mitigate. Processing special categories of personal data can provide a more nuanced understanding of how AI systems impact different groups. There is already evidence that allowing algorithms access to personal sensitive data can help detect and mitigate system biases and discriminatory outcomes.[47] In fact, existing research supports the view that permitting algorithms to access sensitive personal data, under strict and exceptional conditions, can be instrumental in identifying and mitigating systemic biases which lead to discriminatory outcomes. A notable example is the Gender Shades project, which exposed significant biases in automated facial analysis algorithms and datasets by analysing their performance across different demographic groups.[48]

In addition, several organisations are also developed tools and frameworks that leverage sensitive personal data, with appropriate safeguards, to detect and mitigate biases in AI systems. IBM’s AI Fairness 360 toolkit,[49] Google’s What-If Tool,[50] and Microsoft’s Fairlearn are prime examples of such initiatives.[51] These tools provide developers with the means to assess and improve the fairness of their AI models by analysing their behaviour across different groups and identifying potential sources of bias.

F. The exception raises significant risks to privacy

The exception raises serious risk that this exception will in practice function to undermine and regress the effectiveness of the GDPR’s data protection principles; it raises serious concerns about the potential for the misuse and unintentional consequences. In an attempt to mitigate this risk, the Act makes the exception subject to specific safeguards outlined at Article 10(5)(a)-(f). Further, this processing of special categories of personal data must also be in accordance with the obligation for data governance and management practices outlined at sub paragraph 2 of the Article, specifically points (f) and (g).[52] However, even with these current safeguards, the concerns remain.

As regards the risk of misuse, there are compelling incentives for Providers of AI systems to exploit this exception. Simply – the ability to process special category data will, as a logical necessity, provide a competitive edge. By processing sensitive, more predictive data, providers can develop more accurate and nuanced AI models, potentially leading to better performance and increased market share.[53] Processing existing special category data under the exception will also be a cost-effective way to improve AI models without investing in additional data collection and labelling efforts, which can be expensive for Providers.[54] Some of the risks to privacy include the use of this special personal data for commercially exploitative purposes, such as targeted advertising, discriminatory pricing practices, influencing individual behaviour or decisions. In more exceptional circumstances, it may incentivise providers to classify their AI systems as high-risk even if the risks associated with their systems are not objectively high; or intentionally engineer conditions of bias that manufacture necessity for gain access to this exception and process special categories of individuals’ data.

As regards unintended consequences, AI systems have demonstrated an exponential capacity for pattern recognition and association, which offers a capacity for data to be combined in a way capable of reliably (re)identifying, classifying and clustering individuals based on seemingly unconnected, non-personal pieces of data.[55] This capacity for sophisticated profiling, even when utilizing data that falls outside the purview of ‘special categories’ under the GDPR, is already posing  a grave threat to privacy. The core GDPR principles of data minimisation[56] and purpose limitation[57] are particularly vulnerable here. AI’s capacity for repurposing and combining data in unforeseen ways can easily circumvent the original purpose of data collection and processing, leading to a bypassing of these safeguards. In essence, the Article 10(5) exception, while aiming to combat bias, risks opening a Pandora’s box of privacy risks.

G. Insufficient safeguards in the Act

 Because of the significant risks involved, the Act makes this exception “subject to appropriate safeguards for the fundamental rights and freedoms of natural persons,”[58] – including requirements of necessity; technical limitations on the re-use of personal data (purpose limitation), privacy-preserving measures and cyber security; strict controls and documentation of the access of this data, confidentiality; 3rd party access, transmission and transfer prohibitions; and requirements for deleting the data once the bias has been corrected or once data retention period has expired (data minimisation).[59] This is in addition to the provisions set out in Regulation (EU) 2016/679, Directive (EU) 2016/680 and Regulation (EU) 2018/1725. Finally, the Act requires the keeping of records of processing activities, as well as the reasons why the processing of special categories of personal data was strictly necessary to detect and correct biases, and why that objective could not be achieved by processing other data.”[60]

Based on what we know about AI’s basic engagement with data, I propose that these safeguards are insufficient to effectively respond to the foreseeable risks of AI algorithmic processing of special categories of personal data; more specific, detailed and evolving safeguards are required. Moreover, research illustrates that these safeguards are already proving insufficient to protect privacy of personal data in the context of algorithmic data processing and decision making.[61]

As regards the first safeguard of necessity, the Act assumes here that the promoted alternative data types (synthetic or anonymised data) can adequately replicate the complexities of real-world data, which is often not the case.[62]Moreover, even though necessity is subjective and open to interpretation, the Act does not provide any guidance or criteria for making this determination, leaving room for potential misuse or abuse of this provision by providers. A more stringent and clearly defined necessity standard, coupled with robust oversight mechanisms, is required given the risks of AI processing of personal data.

Regarding technical limitations, the effectiveness of this safeguard hinges on the robustness of the technical measures implemented and the ability to enforce these limitations. The dynamic nature of AI, with models constantly evolving and being integrated into new systems, poses a challenge in ensuring that the data remains confined to its intended use.[63] Additionally, the AI Act does not explicitly define what constitutes “technical limitations,” leaving room for interpretation and potential loopholes that could be exploited.

Act mandates the use of “state-of-the-art security and privacy-preserving measures,” including pseudonymization, to protect special categories of personal data. Studies are already demonstrating that these measure are ineffective in the face of AI capabilities.[64] Pseudonymization refers to the process of substituting original identifiers with fictitious identifiers, commonly known as pseudonyms. While these measures aim to enhance privacy, their effectiveness can be questioned in the context of rapidly evolving AI technologies.[65] As Boudolf notes for instance, “researchers recently succeeded in reconstructing both pixelized and blurred faces by making use of neural networks.” In other words, AI-powered neural networks can be used to reverse-engineer anonymised data and potentially re-identify individuals.

Moreover, the term “state-of-the-art” is inherently fluid, besdcasue as the histrory of AI reminds us, what is considered cutting-edge today will quickly become outdated. This bring into question the long-term viability of these measures in safeguarding sensitive data against emerging threats that arise with more sophisticated techniques. Pseudonymization, while a valuable privacy-enhancing technique, does not completely anonymize data. As AI-powered de-anonymization techniques become more sophisticated, the risk of re-identifying individuals from pseudonymized data will also increases.

Further, the nature of AI models, especially Machine Learning models that continuously learn and adapt, raise questions about the feasibility of complete data deletion. As Shah and Rishav note, “records in a database become interdependent” and the deteled data’s influence remains subliminally in the remaing data due to this interdependence.[66] Notably, the Act does not specify the duration of the retention period, leaving it open to interpretation by AI providers, and can lead to situations where data is retained for longer than necessary, increasing the risk of unauthorized access, further processing or other forms of misuse. In the circumstances, a more stringent approach is warranted. Furthermore, these safeguards primarily focus on technical measures, overlooking the human element in data breaches, coincidentally also the most vulnerable element in cyber security. The Act does not address the potential for insider threats or the need for ongoing training and awareness programs on the handling of sensitive personal data in ML models and about evolving security risks. The effectiveness of “strict access controls and documentation” hinges on the consistent and rigorous implementation of these protocols by all authorised personnel. However, human error, negligence, or even malicious intent can easily undermine these safeguards. Finally, in its structure, the Act relies on national authorities and enforcement agencies, which may lack the resources and expertise to effectively monitor and audit provides’ compliance with the safeguards of this exceptions, especially given the rapid pace of AI development and the complexity of the technology. Where these fail, the risk of abuse of this exception are profound. Moreover, as the technology evolves, new techniques may emerge that could circumvent the safeguards or make them less effective.

An incomplete response to the Proxy Problem

Nonetheless, even where this exception somehow works perfectly as intended, a fundamental challenge remains. While well intended to address bias in AI systems, the exception falls short of effectively tackling the proxy problem – the root cause of indirect discrimination in AI systems. Again, this is where an algorithms relies on an indirect indicator or a special category like race. The exception fails to account for the subtle yet pervasive ways in which indirect indicators can stand in for and disguise special categories. Even with Article 10(5) in place, Ai systems will still perpetuate bias and discrimination through proxies or indirect indicators of prohibited grounds or special categories of personal data.

To effectively address indirect discrimination, in my current view, the AI Act would need to rise beyond its current focus on the detection and correction of explicit prohibited grounds. It would need to incorporate a more nuanced understanding of the proxy problem. This might, for instance, involve the Act requiring Providers to assess not only the direct use of protected attributes/sentetive data points, but also the potential for indirect discrimination through correlated or proxy variables. This may also include ongoing evaluation, regular audits and impact assessments, exanding the scope to detecting and mitigating discrimination through proxy. This would necessitate a more comprehensive analysis of the data used to train and operate AI systems.

However, even this is an admittedly imperfect solution become as AI systems evolves, become more sophisticated and complex, and become more adept at identifying and relying on an even broader array of proxies or indirect indicators of prohibited grounds. These proxies increasingly grow more subtle, less obvious and less intuitive to the human mind, often appearing disconnected from the individual. At the very least, this calls for a deeper reflection on the current prohibited grounds approach and its continued effectiveness in the context of algorithmic discrimination. More research is required in this area.

H.  Conclusion

The AI Act’s Article 10(5) exception, while a novel attempt to address algorithmic discrimination, exposes significant gaps in its approach and safeguards. The exception presents significant risks that remain insufficiently mitigated by the safeguards put in place by the Act. Even if stringently applied, these safeguards remain insufficient to fully mitigate the risk of misuse, unintendedn consequences and the erosion of constitutive GDPR principles over time, as AI systems grow more sophisticated and adept at evading the laregly static safeguards.

Furher, even with the exception in place, Ai systems will still perpetuate bias and discrimination through proxies or indirect indicators of prohibited grounds or special categories of personal data. This calls for a deeper reflection on the current prohibited grounds approach and its continued effectiveness in the context of algorithmic discrimination.

In closing, I find that this complex and evolving nature of AI necessitates a comprehensive and adaptable legal framework. This includes an effective response to the proxy problem, clearer operational guidance for Providers, dynamic and evolving technical safeguards that keep pace with AI advancements, and robust, resourced oversight mechanisms coupled with AI Literacy – specifically on how data processing works with proxy data to produce bias. The AI Act marks a significant step in AI regulation; its effectiveness in combating algorithmic discrimination hinges on addressing these critical shortcomings. Nevertheless, its shortcomings highlight the need for ongoing efforts to refine and strengthen its provisions.

Refernces

[1] EU Artificial Intelligence Act (2024), available at https://www.europarl.europa.eu/news/en/press-room/20240308IPR19015/artificial-intelligence-act-meps-adopt-landmark-law

[2] The EU AI Act.

[3] Gabbrielle M Johnson (2021). ‘Algorithmic Bias: On the Implicit Biases of Social Technology’, Synthese, 198, p. 9941; John W Patty and Elizabeth Maggie Penn (2023). ‘Algorithmic Fairness and Statistical Discrimination’, Philosophy Compass, Vol 18, Issue.

[4] See Recitals 31, 3, 28, 56, 54, and Article 5 and 10 of the EU AI Act.

[5] This is also the phrase employed in discourse to refer to discrimination produced by AI systems. See for example Johnson (2021); Ponce (2023), Computer Law & Security Review, Vol 48, Article 105766.

[6] Article 2(2)(b) of the Racial Equality Directive 2000/43/EC; European Commission (2022), Indirect Discrimination under Directives 2000/43 and 2000/78, Publications Office, available at https://data.europa.eu/doi/10.2838/93469 accessed 19 October 2023.

[7] Handbook on European Non-Discrimination Law (2018); Schabas (2017), The European Convention on Human Rights: A Commentary.

[8] Geiger et al. (2021), Quantitative Science Studies, 2, p. 795;  Knight (2023), available at https://www.wired.com/story/ai-chatbots-can-guess-your-personal-information/ accessed 18 October 2023.

[9] Gravett (2021), South African Journal of Criminal Justice, Vol 34, No 1, available at https://journals.co.za/doi/pdf/10.47348/SACJ/v34/i1a2

[10] Dieterich et al. (2016), COMPAS Risk Scales: Demonstrating Accuracy Equity and Predictive Parity, available at http://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf

[11] Obermeyer et al. (2019), Science, 366, p. 447.

[12] Norori et al. (2021), Patterns, 2; IBM Data and AI Team (2023), Shedding Light on AI Bias with Real World Examples, IBM Blog, 16 October 2023, available at https://admin01.prod.blogs.cis.ibm.net/blog/shedding-light-on-ai-bias-with-real-world-examples/ accessed 5 May 2024.

[13] MIT Technology Review (2021), LinkedIn’s Job-Matching AI Was Biased. The Company’s Solution? More AI.

[14] The Washington Post (2015), Google’s Algorithm Shows Prestigious Job Ads to Men, but Not to Women. Here’s Why That Should Worry You. 

[15] ACLU (2018). ‘Why Amazon’s Automated Hiring Tool Discriminated Against Women’, American Civil Liberties Union, 12 October 2018.

[16] O’Donnell (2024), LLMs Become More Covertly Racist with Human Intervention, MIT Technology Review, 11 March 2024; Hanna et al. (2023), Assessing Racial and Ethnic Bias in Text Generation for Healthcare-Related Tasks by ChatGPT-1; Omiye et al. (2023), Large Language Models Propagate Race-Based Medicine.

[17] The Conversation (2024), Ageism, Sexism, Classism and More: 7 Examples of Bias in AI-Generated Images

[18] Raji and Buolamwini (2019), Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society.

[19] Knight (2024), Inside a Misfiring Government Data MachineWired.

[20] Varona and Suárez (2022), Discrimination, Bias, Fairness, and Trustworthy AI, 12 Applied Sciences; Heinrichs (2022), Discrimination in the Age of Artificial Intelligence; Johnston (2021); Ntoutsi et al. (2020), Bias in Data-Driven Artificial Intelligence Systems—An Introductory Survey.

[21] Ibid.

[22] This is evidenced by a heightened focus on data quality, (see Article 10 generally). See also Recital 67, which highlights that biases can be inherent in underlying data sets, and emphasizes the importance of high-quality data sets for training AI systems, particularly to avoid perpetuating and amplifying existing discrimination.

[23] See for example Article 9 of the Act which requires a risk management system for high-risk AI systems, which includes identifying and mitigating risks related to the AI system’s design. While Article 15 requires that high-risk AI systems be designed and developed in a way that ensures their accuracy, robustness, and cybersecurity.

[24] To that end, Article 10(4) requires that data sets take into account, the characteristics or elements that are particular to the specific geographical, contextual, behavioural or functional setting within which the high-risk AI system is intended to be used. Accordingly, Article 14 requires appropriate human oversight of high-risk AI systems. Furhter, Recital 13 introduces the concept of “reasonably foreseeable misuse,” in recognition of the fact that the way an AI system is deployed and used can also lead to discriminatory outcomes. Recital 86 also emphasises the need for deployers to understand the context of use and identify potential risks not foreseen in the development phase.

[25] See Article 9 of GDPR.

[26] At para 10 of the Preamble to the Act, it makes clear that it “does not seek to affect the application of existing Union law governing the processing of personal data. See also para 45, 48, 67, 70 of Preamble,

[27] European Convention on Human Rights, which provides that “[t]he enjoyment of the rights and freedoms set forth in this Convention shall be secured without discrimination on any ground such as sex, race, colour, language, religion, political or other opinion, national or social origin, association with a national minority, property, birth or other status.”

[28] Charter of Fundamental Rights of the European Union 2012 , which provides that “[a]ny discrimination based on any ground such as sex, race, colour, ethnic or social origin, genetic features, language, religion or belief, political or any other opinion, membership of a national minority, property, birth, disability, age or sexual orientation shall be prohibited.”

[29] Council Directive 2000/43/EC of 29 June 2000 implementing the principle of equal treatment between persons irrespective of racial or ethnic origin, which provides that “the principle of equal treatment shall mean that there shall be no direct or indirect discrimination based on racial or ethnic origin.”

[30] Council Directive 2000/78/EC of 27 November 2000 establishing a general framework for equal treatment in employment and occupation, which “lay[s] down a general framework for combating discrimination on the grounds of religion or belief, disability, age or sexual orientation as regards employment and occupation.”

[31] Directive 2006/54/EC of the European Parliament and of the Council of 5 July 2006 on the implementation of the principle of equal opportunities and equal treatment of men and women in matters of employment and occupation (recast), which provides that “For the same work or for work to which equal value is attributed, direct and indirect discrimination on grounds of sex with regard to all aspects and conditions of remuneration shall be eliminated.”

[32] Council Directive 2004/113/EC of 13 December 2004 implementing the principle of equal treatment between men and women in the access to and supply of goods and services, which provides that “the principle of equal treatment between men and women shall mean that (a) there shall be no direct discrimination based on sex, including less favourable treatment of women for reasons of pregnancy and maternity; (b) there shall be no indirect discrimination based on sex.”

[33] Council Directive on implementing the principle of equal treatment between persons irrespective of religion or belief, disability, age or sexual orientation {SEC(2008) 2180}which provides a framework for “combating discrimination on the grounds of religion or belief, disability, age, or sexual orientation”.

[34] See Article 21 of Charter of Fundamental Rights of the European Union; Article 14 of the European Convention on Human Rights. See also Directive 2000/43/EC; Directive 2000/78/EC; Directive 2006/54/EC; Directive 2004/113/EC; and Directive Proposal (COM(2008)462).

[35] UNESCO Recommendation on the Ethics of Artificial Intelligence (2021); Doebbler (2007), The Principle of Non-Discrimination in International Law; Ponce (2023), Computer Law & Security Review, Vol 48.

[36] Ibid; Datta et al. (2017), Proxy Non-Discrimination in Data-Driven Systems; Johnson (2021), Synthese, 198, p. 9941.

[37] Johnson, (2021).

[38] Johnson, (2021) discussing the trade off with accuracy in algorithmic decisions and predictions.

[39] Ibid.

[40] Deck et al. (2024), Implications of the AI Act for Non-Discrimination Law and Algorithmic Fairness.

[41] Coglianese and Lehr (2019), Transparency and Algorithmic Governance, 71 Admin. L. Rev., p. 1, 15 (“The algorithm itself tries many possible combinations of variables, figuring out how to put them together to optimize the objective function.”).

[42] An Explanation of the Relationship between Artificial Intelligence and Human Beings from the Perspective of Consciousness (2023).

[43] See Article 9 of GDPR.

[44] Feiler et al. (2024), EU AI Act: Diversity and Inclusion Prevails over Data ProtectionLexology.

[45] Seyma Yucer and others, ‘Racial Bias within Face Recognition: A Survey’ (arXiv, 1 May 2023); IBM Data and AI Team, ‘Shedding Light on AI Bias with Real World Examples’ (IBM Blog, 16 October 2023).

[46] Artzt and Dung (2022), Artificial Intelligence and Data Protection: How to Reconcile Both Areas from the European Law Perspective; Paterson and McDonagh (2018), Data Protection in an Era of Big Data: The Challenges Posed by Big Personal Data.

[47] van Bekkum and Zuiderveen Borgesius (2023), Using Sensitive Data to Prevent Discrimination by Artificial Intelligence: Does the GDPR Need a New Exception?

[48] ‘Press Kit’ (MIT Media Lab).

[49] IBM AI Fairness 360 is an open-source toolkit with algorithms for detecting and mitigating bias in machine learning models.

[50] Google What-If Tool is a visualization tool to explore how machine learning models behave with different data inputs and help identify biases, available at https://pair-code.github.io/what-if-tool/

[51] Microsoft’s Fairlearn is an open-source toolkit to assess and improve the fairness of AI systems, available at https://fairlearn.org/

[52] While the GDPR provides the general framework for data protection, Article 10(5) acts as a lex specialis for the processing of special categories of personal data in the context of AI. This means that the specific conditions and safeguards of Article 10(5) take precedence over the general provisions of the GDPR in this particular context.

[53] Moira Paterson and Maeve McDonagh, ‘Data Protection in an Era of Big Data: The Challenges Posed by Big Personal Data’ (2018).

[54] AI is currently very expensive to train, especially models that would be categorised as high risk under the Act. For instance, see The Cost of Training AI Could Soon Become Too Much to Bear | Fortune; AI Is Currently Too Expensive to Take Most of Our Jobs, Finds MIT Researchers’ (24 January 2024).

[55] Daniel J Solove, ‘Artificial Intelligence and Privacy’ (1 February 2024); Staab et al. (2023), Beyond Memorization: Violating Privacy via Inference with Large Language Models.

[56] Data minimisation mandates that only the minimum necessary data for a specific purpose be collected and processed.

[57] The purpose limitation principle dictates that data should only be used for the purpose for which it was collected.

[58] Article 10(5)(a)-(f).

[59] Article 10(5)(a)-(f).

[60] Article 10(5)(a)-(f): “(a) the bias detection and correction cannot be effectively fulfilled by processing other data, including synthetic or anonymised data; (b)  the special categories of personal data are subject to technical limitations on the re-use of the personal data, and state of the art security and privacy-preserving measures, including pseudonymisation; (c) the special categories of personal data are subject to measures to ensure that the personal data processed are secured, protected, subject to suitable safeguards, including strict controls and documentation of the access, to avoid misuse and ensure that only authorised persons with appropriate confidentiality obligations have access to those personal data; (d) the personal data in the special categories of personal data are not to be transmitted, transferred or otherwise accessed by other parties;(e) the personal data in the special categories of personal data are deleted once the bias has been corrected or the personal data has reached the end of its retention period, whichever comes first; (f) the records of processing activities pursuant to Regulations (EU) 2016/679 and (EU) 2018/1725 and Directive (EU) 2016/680 include the reasons why the processing of special categories of personal data was strictly necessary to detect and correct biases, and why that objective could not be achieved by processing other data.”

[61] Solove (2024), Artificial Intelligence and Privacy; Staab et al. (2023), Beyond Memorization: Violating Privacy via Inference with Large Language Models.

[62] James et al. (2021), Synthetic Data Use: Exploring Use Cases to Optimise Data Utility; Majeed and Lee (2021), Anonymization Techniques for Privacy Preserving Data Publishing: A Comprehensive Survey.

Stefanie James and others, ‘Synthetic Data Use: Exploring Use Cases to Optimise Data Utility’ (2021) 1 Discover Artificial Intelligence 15; highlighting re-identification methods used by malevolent adversaries to re-identify people uniquely from the privacy preserved data.

[63] Mühlhoff and Ruschemeier (2024), Updating Purpose Limitation for AI: A Normative Approach from Law and Philosophy.

[64] Paal (2022), Artificial Intelligence as a Challenge for Data Protection Law: And Vice Versa in Oliver Mueller and others (eds), The Cambridge Handbook of Responsible Artificial Intelligence: Interdisciplinary Perspectives; Paterson and McDonagh (2022); Artzt and Dung (2022), Artificial Intelligence and Data Protection: How to Reconcile Both Areas from the European Law Perspective.

[65] Varanda et al. (2021), Log Pseudonymization: Privacy Maintenance in Practice.

[66] Izzo et al. (2021), Approximate Data Deletion from Machine Learning Models; Chourasia and Shah (2023), Forget Unlearning: Towards True Data-Deletion in Machine Learning.

Bibliography

  • Ageism, Sexism, Classism and More: 7 Examples of Bias in AI-Generated Images’- Link accessed 5 May 2024.
  • An Explanation of the Relationship between Artificial Intelligence and Human Beings from the Perspective of Consciousness’ https://journals.sagepub.com/doi/epub/10.1177/20966083211056376  accessed 19 October 2023.
  • Google’s Algorithm Shows Prestigious Job Ads to Men, but Not to Women. Here’s Why That Should Worry You. – The Washington Post’ – Linkaccessed 5 May 2024.
  • IBM AI Fairness 360’ https://pair-code.github.io/what-if-tool/ accessed 5 May 2024.
  • LinkedIn’s Job-Matching AI Was Biased. The Company’s Solution? More AI.’ (MIT Technology Review) – Link accessed 5 May 2024.
  • Press Kit, Gender Shades (MIT Media Lab) https://www.media.mit.edu/projects/gender-shades/press-kit/accessed 20 May 2024.
  • Why Amazon’s Automated Hiring Tool Discriminated Against Women | ACLU’(American Civil Liberties Union, 12 October 2018) – Link accessed 19 October 2023.
  • Abdul Majeed and Sungchang Lee (2021). Anonymization Techniques for Privacy Preserving Data Publishing: A Comprehensive Survey, 9 IEEE Access, p. 8512.
  • ACLU (2018). ‘Why Amazon’s Automated Hiring Tool Discriminated Against Women’, American Civil Liberties Union, 12 October 2018, available at – Link accessed 19 October 2023.
  • An Explanation of the Relationship between Artificial Intelligence and Human Beings from the Perspective of Consciousness (2023), available at https://journals.sagepub.com/doi/epub/10.1177/20966083211056376accessed 19 October 2023.
  • Anupam Datta, Michael Carl Tschantz, and Anupam Datta (2017). Proxy Non-Discrimination in Data-Driven Systems, available at http://arxiv.org/abs/1707.08120 accessed 19 October 2023.
  • Artur Varanda and others (2021). Log Pseudonymization: Privacy Maintenance in Practice, 63 Journal of Information Security and Applications, Article 103021.
  • Baker McKenzie-Dr Lukas Feiler and others (2024). EU AI Act: Diversity and Inclusion Prevails over Data Protection, Lexology, available at – Link accessed 29 June 2024.
  • Cary Coglianese & David Lehr (2019). ‘Transparency and Algorithmic Governance’, 71 Administrative Law Review, p. 1.
  • Curtis FJ Doebbler (2007). The Principle of Non-Discrimination in International Law.
  • Daniel J Solove (2024). Artificial Intelligence and Privacy. Available at https://papers.ssrn.com/abstract=4713111 accessed 7 March 2024.
  • Diginomica (2024). AI Is Currently Too Expensive to Take Most of Our Jobs, Finds MIT Researchers. 24 January 2024. Available at https://diginomica.com/ai-currently-too-expensive-take-most-our-jobs-finds-mit-researchers.
  • European Union Agency for Fundamental Rights and Council of Europe (2018). Handbook on European Non-Discrimination Law. Publications Office of the European Union.
  • Fortune (2024). The Cost of Training AI Could Soon Become Too Much to Bear. Available at https://fortune.com/2024/04/04/ai-training-costs-how-much-is-too-much-openai-gpt-anthropic-microsoft/.
  • Gabbrielle M Johnson (2021). ‘Algorithmic Bias: On the Implicit Biases of Social Technology’, Synthese, 198, p. 9941.
  • Hanna JJ, Wakene AD, Lehmann CU, Medford RJ (2023). Assessing Racial and Ethnic Bias in Text Generation for Healthcare-Related Tasks by ChatGPT-1. medRxiv [Preprint], 28 August 2023. doi: 10.1101/2023.08.28.23294730. PMID: 37693388; PMCID: PMC10491360.
  • Heinrichs, B. (2022). Discrimination in the Age of Artificial Intelligence. 37 AI & Soc, 143–154. Available at https://doi.org/10.1007/s00146-021-01192-2.
  • IBM AI Fairness (n.d.). Available at https://aif360.res.ibm.com
  • IBM Data and AI Team (2023). Shedding Light on AI Bias with Real World Examples, IBM Blog, 16 October 2023, available at Link accessed 5 May 2024.
  • Inioluwa Deborah Raji and Joy Buolamwini (2019). ‘Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products’, Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, Association for Computing Machinery, available at https://doi.org/10.1145/3306618.3314244.
  • James O’Donnell (2024). LLMs Become More Covertly Racist with Human InterventionMIT Technology Review, 11 March 2024. Available at Link.
  • Angwin, J. Larson, S. Mattu, L. Kirchner, “Machine Bias,” ProPublica (23 May 2016); www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
  • John W Patty and Elizabeth Maggie Penn (2023). ‘Algorithmic Fairness and Statistical Discrimination’, Philosophy Compass, Vol 18, Issue 1.
  • L Arnold, ‘How the European Union’s AI Act Provides Insufficient Protection Against Police Discrimination’ – Link accessed 29 June 2024.
  • Luca Deck, et al. (2024). Implications of the AI Act for Non-Discrimination Law and Algorithmic Fairness, available at http://arxiv.org/abs/2403.20089 accessed 29 June 2024.
  • Marvin van Bekkum and Frederik Zuiderveen Borgesius (2023). ‘Using Sensitive Data to Prevent Discrimination by Artificial Intelligence: Does the GDPR Need a New Exception?’, 48 Computer Law & Security Review, Article 105770.
  • Matthias Artzt and TV Dung (2022). Artificial Intelligence and Data Protection: How to Reconcile Both Areas from the European Law Perspective, Vietnamese Journal of Legal Sciences.
  • Microsoft’s Fairlearn (n.d.). Available at https://fairlearn.org/
  • MIT Technology Review (2021). ‘LinkedIn’s Job-Matching AI Was Biased. The Company’s Solution? More AI.’, available at Link
  • Moira Paterson and Maeve McDonagh (2018). ‘Data Protection in an Era of Big Data: The Challenges Posed by Big Personal Data’, 44 Monash University Law Review, p. 1.
  • Natalia Norori and others, ‘Addressing Bias in Big Data and AI for Health Care: A Call for Open Science’(2021) 2 Patterns 100347.
  • Ntoutsi, E., Fafalios, P., Gadiraju, U., Iosifidis, V., Nejdl, W., Vidal, M.-E., Ruggieri, S., Turini, F., Papadopoulos, S., Krasanakis, E., et al. (2020). Bias in Data-Driven Artificial Intelligence Systems—An Introductory Survey. Available at https://doi.org/10.1002/widm.1356.
  • Omiye, J.A., Lester, J.C., Spichak, S., et al. (2023). Large Language Models Propagate Race-Based Medicine. npj Digit. Med., 6, Article 195. Available at https://doi.org/10.1038/s41746-023-00939-z.
  • Paal (2022). Artificial Intelligence as a Challenge for Data Protection Law: And Vice Versa in Oliver Mueller and others (eds), The Cambridge Handbook of Responsible Artificial Intelligence: Interdisciplinary Perspectives. Cambridge University Press. Available at Link accessed 30 June 2024.
  • Paula Pedigoni Ponce, ‘Direct and Indirect Discrimination Applied to Algorithmic Systems: Reflections to Brazil’ (2023) 48 Computer Law & Security Review 105766.
  • R Stuart Geiger and others, ‘“Garbage in, Garbage out” Revisited: What Do Machine Learning Application Papers Report about Human-Labeled Training Data?’ (2021) 2 Quantitative Science Studies 795.
  • Rainer Mühlhoff and Hannah Ruschemeier (2024). Updating Purpose Limitation for AI: A Normative Approach from Law and Philosophy. Available at https://papers.ssrn.com/abstract=4711621
  • Rishav Chourasia and Neil Shah (2023). Forget Unlearning: Towards True Data-Deletion in Machine Learning, Proceedings of the 40th International Conference on Machine Learning (PMLR 2023).
  • Rishav Chourasia and Neil Shah (2023). Forget Unlearning: Towards True Data-Deletion in Machine Learning, Proceedings of the 40th International Conference on Machine Learning (PMLR 2023). Available at https://proceedings.mlr.press/v202/chourasia23a.html accessed 30 June 2024.
  • Robin Staab, Mark Vero, Mislav Balunovic, Martin Vechev (2023). Beyond Memorization: Violating Privacy via Inference with Large Language Models. Department of Computer Science, ETH Zurich. arXiv:2310.07298v1 [cs.AI], 11 October 2023.
  • Seyma Yucer, et al. (2023). Racial Bias within Face Recognition: A Survey, available at http://arxiv.org/abs/2305.00817 accessed 29 June 2024.
  • Solove, Daniel J., and Woodrow Hartzog. “The FTC and the new common law of privacy.” (2014) Columbia Law Review 114, no. 3: 583-676.
  • Stefanie James and others (2021). Synthetic Data Use: Exploring Use Cases to Optimise Data Utility, 1 Discover Artificial Intelligence, p. 15.
  • Varona, D., & Suárez, J.L. (2022). Discrimination, Bias, Fairness, and Trustworthy AI. 12 Applied Sciences, Article 5826. Available at https://doi.org/10.3390/app12125826.
  • Will Knight, ‘AI Chatbots Can Guess Your Personal Information from What You Type’ Wired https://www.wired.com/story/ai-chatbots-can-guess-your-personal-information/ accessed 18 October 2023.
  • Will Knight, ‘Inside a Misfiring Government Data Machine’ Wired https://www.wired.com/story/algorithmic-bias-government/ accessed 5 May 2024.
  • Willem Gravett, ‘Sentenced by an Algorithm – Bias and Lack of Accuracy in Risk-Assessment Software in the United States Criminal Justice System’ (2021) South African Journal of Criminal Justice 34(1): 24-47. Available at https://journals.co.za/doi/pdf/10.47348/SACJ/v34/i1a2
  • William A Schabas, The European Convention on Human Rights: A Commentary (Oxford University Press, Oxford, 2017).
  • William Dieterich et al., COMPAS Risk Scales: Demonstrating Accuracy, Equity, and Predictive Parity (July 8, 2016) – Link
  • Zachary Izzo and others (2021). Approximate Data Deletion from Machine Learning Models, Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (PMLR 2021). Available at https://proceedings.mlr.press/v130/izzo21a.html accessed 30 June 2024.
  • Ziad Obermeyer and others, ‘Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations’ (2019) 366 Science 447.