Systemic bias in data models is a human rights issue


A data-driven political economy can be quite a dehumanizing experience, as data-driven societal models are not immune to errors, biases, or discriminatory conclusions about human interaction. These biases and misleading conclusions can have drastic real-life consequences for individuals and vulnerable communities. That is why scholars use drastic terms such as “data colonialism” to describe how a technologically-driven economic logic increasingly views human traditions, norms, and values as an obstacle to doing business instead of treating them with the respect they deserve.

A key criticism of many human rights groups addressed at tech companies is the latter’s lack of stakeholder engagement with rightsholders affected by data errors and cultural blind spots. This lack of engagement has far-reaching repercussions, including systemic bias in data models. This bias is difficult to address and overcome because machine learning algorithms are often trained using historic data sets that can replicate racist and gender-related bias in society. This process reinforces historic biases, resulting in discrimination against vulnerable groups which have been systematically underrepresented or evaluated to their disadvantage.

Modelling based non-representative data and skewed societal beliefs

The gender data gap affects a plethora of contexts, including government policy, medical research, technology, workplaces, urban planning, and the media. As data sets are typically based on male features, there are significant blind spots regarding female representation. At work, for instance, occupational health and safety measures are often designed based on data that is targeted towards men, neglecting the physiognomy and consequential physical measures for the protection of women. In a similar fashion, machine learning algorithms are trained to classify people into clear-cut (binary) categories, such as “male/female,” but it is questionable how the rights of people whose physiognomy or identity does not fit these fixed categories could be respected when they are literally not seen at all by technology.

A classic example: If a company uses personal data of job candidates for recruiting purposes, it might acquire data from data brokers without the data subject having provided consent. Based on this data, an algorithm might decide that a female candidate or a candidate of color might be less qualified for a position. This could be due to algorithmic bias, which can occur when an algorithm was trained using biased data from, for instance, a tech company that hired mostly male, white staff in the past. Learning from this historic training data, the algorithm would conclude that being male and white is the profile for a suitable candidate, resulting in discrimination against female candidates and candidates of color. The burden of proof lies with the job candidate, who has very little chance of obtaining the information to prove that the hiring data model was biased.

Data universalism isn’t the cure

The assumed default position when rolling out data-driven business models often reflects a “data universalism” that neglects the fact that human rights are embedded in a wide variety of social, cultural, economic, and political contexts. Ignoring the importance of context can lead to cascading negative effects on affected individuals, particularly in the global South. Data models cannot be “cured” from this problem by merely eradicating “statistical” issues around bias without taking the context of their use into account.

For instance, while platform-based work can produce new opportunities in the global South and widen participation for some actors, it might also reinforce existing socio-cultural hierarchies, such as caste systems in India, as one example. Algorithmic governance might thus enshrine precarity for informal workers unless there is situated reckoning of the unique historical and labor needs of global South geographies, rather than a blind adoption of “universal” (that is, Western) AI futures.

Data models cannot be “cured” from this problem by merely eradicating “statistical” issues around bias without taking the context of their use into account.

Adding to this, a universalistic treatment of potential victims through categorization of abuses is problematic. Often allegations of abuse can be filed solely through a system that fits into the computerized response system propagated by tech companies. This practice can be quite misleading for the context of sexual misconduct as a “constructed system of defining and sorting—fall[ing] short in capturing a range of users’ experience of sexual misconduct” as Margie Cheesman and Kate Sim argue.

Context-specific models are needed

Scrutinizing data models from a context-specific human rights perspective will play an increasingly vital role in setting boundaries to surveillance capitalism and in enabling a digitally mediated life worth living for billions of individuals and their respective communities. When it comes to addressing these challenges in practice, this implies that human rights lawyers, policymakers, social scientists, computer scientists, and engineers need to work together to critically question and challenge the blind spots of AI in data-driven business (e.g., whether data is representative for the whole population, privacy is respected throughout the technological lifecycle, results are explainable, individuals are not discriminated by proxy such as by group, affected individuals have a right to contest, etc.). Key demands on how to overcome these inherent biases can be found in the Feminist Data Manifest-No, or the Toronto declaration, among many other declarations. These can be a conversation starter when it comes to adapting human rights due diligence processes to data-driven business models.

Further suggestions on how to use human rights due diligence to address human rights impacts in data-driven business can be found in our study on business and human rights in the data economy. The three key recommendations are:

  1. Business needs a life cycle approach to capture emerging and systemic human rights problems to identify, address and eradicate systematic distortions that have negative impacts on human rights in datafied environments. “Data universalism” needs to be overcome and concepts need to be developed with the local embeddedness in mind, based on a robust, human rights due diligence.
  2. Civil society needs to develop new methods of how companies can be held accountable for “digital” human rights violations. This point is closely connected to the public policy debate about the state duty to protect human rights, and hence also digital rights.
  3. Policymakers should take digital rights into account in legislative proposals on human rights due diligence for business and revisit whether existing protection can still cover emerging digital issues. Legislators should strengthen digital rights in the coming years and strategically link them up with other legislative debates on human rights due diligence.

As we outlined above, we need to ask the questions that address the broader political, economic, and cultural implications of technology and human rights, not merely its technical aspects.